We've noticed that recently users who updated their Wordpress to the 3.5.1 version have encountered problems with google crawl errors. The problem occurs when they create a new post and add tags into it. Also we suspect that if user is utilizing a custom plugin which creates the sitemap then it's most likely causing the crawl problems as well.
These errors simply mean that Google bot (A robot specially made to crawl and index your site) is having problems with a particular page. Usually 404 errors happen naturally from time to time due to dynamic nature of internet. Having just a few of them won't affect the way Google sees your website in terms of good structure. However, having a lot of these errors may cause bad user experience and quality ratings. Obviously - this can hurt your SERP (Search engine results page).
We came up with a simple solution that may work for you perfectly. Besides, it will protect the site from malicious crawlers and spambots. It's good to give it a try hummm ? :)
Working out the Robots.txt file
Go to your FTP (File Transfer Protocol) directory and locate Robots.txt file. It should be in the root directory of your Wordpress installation. You can use an FTP client ex. Filezilla or WinSCP in order to transfer and edit your files with bigger comfort. Connect to your server using the data you've received from your hosting provider at the time of first registration. Download or edit 'on the run' the Robots.txt file and replace it's content with the following parameters:
User-agent: * Disallow: /feed/ Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Disallow: /trackback/ Disallow: /xmlrpc.php Disallow: ?wptheme= Disallow: /transfer/ Disallow: /tweets/ Disallow: /mint/ Allow: /tag/mint/ Allow: /tag/feed/ Allow: /wp-content/online/ Sitemap: http://example.com/sitemap.xml User-agent: ia_archiver Disallow: /
We are disallowing search engines to view our vulnerable data and allowing them to crawl the /tag/ and /tag/feed/ directories (this should limit the 404 errors that google lists under webmaster tools 'crawl errors' section).
Robots.txt & Sitemap
Remember to change Sitemap URL in Robots.txt code from http://example.com/sitemap.xml to the adress of your website.
Hope this solve things out! Happy using!