Reputation: 21
I'm not sure how good Google's robots.txt
tester is and wondering if the following example of my robots.txt
for me WooCommerce site will actually do the trick for blocking bots from adding to cart and crawling cart pages, while allowing good bots like Google to crawl the site and also block some bots that have been causing resource usage. Here's my example below with my **comments (comments are not included in the actual robots.txt
file):
**block some crawlers that were causing resource issues (do I need a separate "Disallow: /" for each one?)
User-agent: Baiduspider
User-agent: Yandexbot
User-agent: MJ12Bot
User-agent: DotBot
User-agent: MauiBot
Disallow: /
**allow all other bots
User-agent: *
Allow: /
**drop all allowed bots from adding to cart and crawling cart pages
Disallow: /*add-to-cart=*
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
crawl-delay: 10
Sitemap: https://www.example.com/sitemap.xml
I put this through Google's robots.txt
checker and it came out with 1 warning on the crawl delay, telling me it would be ignored.
Upvotes: 2
Views: 1460
Reputation: 25524
Baidu and Yandex are actual search engines from China and Russia respectively. I wouldn't recommend blocking them because they can send legitimate traffic to your site. I would remove
User-agent: Baiduspider
User-agent: Yandexbot
Your allow rule is totally unnecessary. By default crawling is allowed unless there is a matching Disallow
rule. Allow:should only be used to add a more specific exception to a
Disallow:` Most robots can't deal with it, so if you don't need it, you should take it out so that you don't confuse them.
crawl-delay: 10
is way too slow for anything other than sites with just a handful of pages. Most ecommerce sites need bots to be able to crawl multiple pages per second to get their site indexed in search engines. While Google ignores this directive, setting a long crawl delay like this will prevent other search engines like Bing from effectively crawling and indexing your site.
Most robots don't treat *
as a wildcard in Disallow:
rules. The major search engine bots will understand that rule, but few other bots will.
Other than that, your robots.txt
looks like it should do what you want. Google's testing tool is a good place to verify that Googlebot will do what you want.
Upvotes: 1