Feed on Posts or Comments

WordPress & SEO Chris Gray on 07 Jun 2007 07:01 pm

You’re a sneaky one Google Sitemap Generator

Ok, I just had a WTF moment… A couple of weeks ago I posted an article on how to avoid Google’s Supplemental Index. I suggested using a robots.txt file to prevent Google from indexing duplicate content so as to avoid the supplemental index. While perusing my WordPress root directory last night I ended up taking a peek at my sitemap (sitemap.xml). Much to my surprise there were a bunch of references to URIs that I just finished blocking in my robots.txt…WTF!?

And then it hit me…I was using the Google Sitemap Generator plug-in to automatically generate my sitemap…there was probably a setting in there throwing things off. For those of you who are not familiar with this plug-in, it automatically re-generates your sitemap.xml every time you post…saving you the trouble of having to manually update it by hand after each post.

Sure enough, come to find out that I didn’t uncheck a number of key default settings when I activated the plug-in. The default settings are shown below:

Google Sitemap Generator Defaults

Well there it is…duplicate content. Since I am showing full posts on the home page and categories, I was asking Google to index duplicate content by also including individual posts and archives in the sitemap. A case when the defaults really should not be defaults IMHO. The thing that gets me is that the readme for this plug-in actually says to use the defaults!

Q: “So much configuration options… Do I need to change them?”
A: “No! Only if you want. Default values should be ok!”

So in an effort to remove duplicate content from my sitemap I only checked “Include posts” and “Include static pages” (see below). I am hoping that this will help.

Google Sitemap Generator Fixed Settings

The lesson I learned all too well was that I need to take the time to understand and configure a new plug-in before I activate it. From now on I plan on testing out new plug-ins thoroughly (in my local sandbox) prior to activating them on my production site…

So now I am curious…what is the best way to configure Google Sitemap Generator for SEO? I would love to hear your recommendations!

Share: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Technorati
  • YahooMyWeb
  • Bumpzee

11 Responses to “You’re a sneaky one Google Sitemap Generator”

  1. on 12 Jun 2007 at 5:25 am 1.kichus said …

    Did you miss the /category/ one to include in your robots.txt file? Just thought of let you know if you really missed that.

  2. on 12 Jun 2007 at 6:28 am 2.Chris Gray said …

    kichus - you are correct, I missed /category/…thanks! I will add this one today. Thanks for the catch.

  3. on 07 Aug 2007 at 4:51 pm 3.David Airey :: Graphic Designer » Top 5 essential WordPress Plugins said …

    […] Chris gives us an important Plugin tip for the sitemap generator […]

  4. on 12 Aug 2007 at 9:52 am 4.Rob Cubbon - freelance graphic designer said …

    Chris, Thanks for this I have amended my “Includings” accordingly. Do you think Disallow: /category/ is necessary in robots.txt ?

  5. on 12 Aug 2007 at 12:06 pm 5.LaurenMarie - Creative Curio said …

    I followed the link over from David’s site. Thank you for posting this! I’m going to install the plugin per David’s recommendation and you have helped me avoid the evil duplicate content ;)

  6. on 12 Aug 2007 at 4:46 pm 6.kristarella said …

    Thanks for that! Might be a way to stop Google sucking more bandwidth :)

  7. on 20 Aug 2007 at 9:52 pm 7.Chris Gray said …

    @Rob - I exclude /category/ in my robots.txt because I publish full text in my category links (so disallowing “/category/” in robots.txt prevents search engines from indexing duplicate content). That also goes for my sitemap as well…I do not include categories in my sitemap for the same reason.

    @LaurenMarie - Glad to be of some assistance :) More and more, I am finding that WordPress plug-ins do not always contain the best “default values” and I end up having to test pretty thoroughly before activating them. Hope the Google SiteMap plug-in works out well for you!

    @kristarella - No problem at all :) I am not 100% sure what you mean by “sucking bandwidth” but this should reduce the probability that you will end up in the supplemental index. If you are concerned about the rate at which Google indexes your site you can adjust the “Crawl rate” in Google’s Webmaster Tools control panel.

  8. on 21 Aug 2007 at 3:23 pm 8.Rob Cubbon - freelance graphic designer said …

    Have disallowed /category/ and /archives/ in my robots.txt – thanks so much for this, Chris.

  9. on 21 Aug 2007 at 10:44 pm 9.Chris Gray said …

    @Rob - awesome, hope it helps keep you out of the supplemental index! I just checked my site and these steps appear to have helped reduce the number of pages I have in there at the moment. I still have a few but I expect them to fall out over time. I would enjoy hearing about your results (after tweaking your robots.txt).

  10. on 23 Aug 2007 at 3:39 am 10.Rob Cubbon - freelance graphic designer said …

    The good news is my traffic from Google seems to have increased. Unfortunately I made the Google Sitemap at more or less the same time as I tweaked my robots.txt so I don’t really know which one has made the difference.

    Will keep you posted if there are any further developments.

    Once again, thanks for the tips!

  11. on 07 Sep 2007 at 10:05 am 11.Chris Gray said …

    @Rob - I know, I am in the same boat. I tend to make a number of changes all at once…so I never know which change had the biggest impact. I guess I should be a little more scientific when testing out a new change.

    Thanks again for sharing your experience with your sitemap/robots.txt.

Trackback This Post | Subscribe to the comments through RSS Feed

Leave a Reply