WordPress & SEO Chris Gray on 10 May 2007 04:29 pm
One way to avoid Google’s Supplemental Index
I ran across a post over at JohnTP.com on how to Create A Robots.txt File And Increase Your Search Engine Rankings. While I don’t agree that using a robots.txt file will
actually increase your search engine rankings, I do know that robots.txt is a must
have if you want to avoid Google’s duplicate content penalty (also known as “Google Hell”).
What is Google Hell you might ask?
Google Hell refers to Google’s supplemental index. The supplemental index
is where pages go when they are marked as either duplicate, no content, or orphaned
(no incoming/outgoing links). The problem with ending up in the supplemental index
is that supplemental pages do not show up in the main search results page. Instead,
these pages are buried under a link at the bottom of the SERPs (i.e. “repeat
the search with omitted results included”). I don’t know about
you but never click on this link…

Some of my pages are currently in the supplemental index because they have been
marked as duplicate content. You can check to see if pages from your site are
in the supplemental index by typing “site:yourdomain.com” into a Google
search prompt. Any page listed as a “Supplemental Result” is in the
supplemental index.
Ok, my pages are in the supplemental index how do I get out?
Here is where robots.txt is comes in handy. Robots.txt is a small text
file that resides in the root directory of your website. This file governs which
files and directories a search engine spider will or will not crawl.
You can include/exclude individual files and directories or use wildcards
to block specific directory/file patterns. All major search engines support robots.txt
and will attempt to read it prior to visiting your site. The entire Robots Inclusion
Standard is located at http://www.robotstxt.org.
I ended up implementing John’s roboots.txt but will probably end up changing to the version recommended by the WordPress codex:
Disallow: /wp-content/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /cgi-bin/
Unfortunately WordPress 2.1.3 does not come standard (to my knowledge) with a
default robots.txt and some of my pages ended up in the supplemental index before
I was able to catch it and add one. Hopefully this post will help you if you are
planning on going live with a new site.







on 14 May 2007 at 3:24 pm 1.Vince Cordic said …
I made the mistake of not putting a robots file and disallowing access to the feed and what not, and of course I paid for it.
I was just too busy messing around with everything else
But anyhow, because I didn’t disallow access to my RSS feed alot of my posts ended up in the supplemental results.
However, I also believe pages with very few inbound links also end up in the supplemental results and that could be part of the problem as well.
I have seen pages in supplemental results move to the main index after some deep linking to them. However, it usually takes a while.
on 15 May 2007 at 10:01 am 2.Chris Gray said …
Vince,
As “SEO minded” as WordPress seems to be, I am surprised that a robots.txt is not part of the default distribution. Three of my WordPress blogs ended up in the supplemental index before I was able to track it down to the lack of a robots.txt.
I wasn’t aware of your final point (lack of inbound links causing you to end up in the supplemental index). I will have to read up a little more on this one.
on 07 Jun 2007 at 7:48 pm 3.You’re a sneaky one Google Sitemap Generator - SEO Ladder said …
[…] I just had a WTF moment… A couple of weeks ago I posted an article on how to avoid Google’s Supplemental Index. I suggested using a robots.txt file to prevent search engine spiders from indexing duplicate […]
on 16 Feb 2008 at 12:48 pm 4.fedmich said …
I also did encoutered massive spiders and spammer attack so I update my robots.txt.
I hope this one could protect my site and save me bandwidth per month.
Thanks again