dubpir8
dubpir8

Reputation: 21

Robots.txt - blocking bots from adding to cart in WooCommerce

I'm not sure how good Google's robots.txt tester is and wondering if the following example of my robots.txt for me WooCommerce site will actually do the trick for blocking bots from adding to cart and crawling cart pages, while allowing good bots like Google to crawl the site and also block some bots that have been causing resource usage. Here's my example below with my **comments (comments are not included in the actual robots.txt file):


**block some crawlers that were causing resource issues (do I need a separate "Disallow: /" for each one?)

User-agent: Baiduspider
User-agent: Yandexbot
User-agent: MJ12Bot
User-agent: DotBot
User-agent: MauiBot
Disallow: /

**allow all other bots

User-agent: *
Allow: /

**drop all allowed bots from adding to cart and crawling cart pages

Disallow: /*add-to-cart=*
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
crawl-delay: 10
Sitemap: https://www.example.com/sitemap.xml

I put this through Google's robots.txt checker and it came out with 1 warning on the crawl delay, telling me it would be ignored.

Upvotes: 2

Views: 1460

Answers (1)

Stephen Ostermiller
Stephen Ostermiller

Reputation: 25524

Baidu and Yandex are actual search engines from China and Russia respectively. I wouldn't recommend blocking them because they can send legitimate traffic to your site. I would remove

User-agent: Baiduspider
User-agent: Yandexbot

Your allow rule is totally unnecessary. By default crawling is allowed unless there is a matching Disallow rule. Allow:should only be used to add a more specific exception to aDisallow:` Most robots can't deal with it, so if you don't need it, you should take it out so that you don't confuse them.

crawl-delay: 10 is way too slow for anything other than sites with just a handful of pages. Most ecommerce sites need bots to be able to crawl multiple pages per second to get their site indexed in search engines. While Google ignores this directive, setting a long crawl delay like this will prevent other search engines like Bing from effectively crawling and indexing your site.

Most robots don't treat * as a wildcard in Disallow: rules. The major search engine bots will understand that rule, but few other bots will.

Other than that, your robots.txt looks like it should do what you want. Google's testing tool is a good place to verify that Googlebot will do what you want.

Upvotes: 1

Related Questions