Reputation: 10686
How can I disallow URLs like 1.html, 2.html, ..., [0-9]+.html
(in terms of regexp) with robots.txt
?
Upvotes: 1
Views: 177
Reputation: 96687
The original robots.txt specification doesn't support regex/wildcards. However, you could block URLs like these:
with:
User-agent: *
Disallow: /0
Disallow: /1
Disallow: /2
Disallow: /3
Disallow: /4
Disallow: /5
Disallow: /6
Disallow: /7
Disallow: /8
Disallow: /9
If you want to block only URLs starting with a single numeral followed by .html
, just append .html
, like:
User-agent: *
Disallow: /0.html
Disallow: /1.html
…
However, this wouldn't block, for example, example.com/12.html
Upvotes: 1