marcg
marcg

Reputation: 658

Should I use a trailing slash when disallowing a directory in robots.txt?

I want to disallow crawling of a directory /acct in robots.txt Which rule should I use?

Disallow: /acct or Disallow: /acct/

acct contains sub-directories and files both. What is the effect of a trailing slash?

Upvotes: 2

Views: 1666

Answers (1)

Stephen Ostermiller
Stephen Ostermiller

Reputation: 25535

Since robots.txt rules are all "starts with" rules, both of your proposed rules would disallow the following:

  • https://example.com/acct/
  • https://example.com/acct/foo
  • https://example.com/acct/bar

However, the following would only be disallowed by the rule without the trailing slash:

  • https://example.com/acct
  • https://example.com/acct.html
  • https://example.com/acctbar

Disallow: /acct/ is usually better because there is no risk of disallowing unexpected URLs. However, it does NOT prevent crawling of /acct.

In most cases web servers redirect directory URLs without a trailing slash to add the trailing slash. It is likely that on your server, https://example.com/acct redirects to https://example.com/acct/. If that is the case, it is usually fine to allow bots to crawl /acct with no trailing slash and see the redirect. They would be blocked from crawling the target of the redirect.

Upvotes: 2

Related Questions