CODEWITHSUNDEEP

web-crawlerrobots.txt

Reputation: 5434

robots txt disallow wild card

I am having trouble stopping google crawling a few urls which cause errors.

I want to stop

/project/123984932842/download/pdf
/project/123984932842/download/zip

but allow

/project/123984932842
/project/123984932842/flat

I tried project/*/download/pdf but it doesn't seem to work. Does anyone know what would?

Upvotes: 4

Views: 4991

Answers (2)

unor

Reputation: 96577

The original robots.txt specification doesn't define any wildcards, but Google (and some others) added them to their parsers. ~~However, I guess you don't need them for your case anyway~~ (as Jim noted, this was wrong). The following robots.txt (using the * wildcard) should do the job:

User-agent: Googlebot
Disallow: /project/*/download

Upvotes: 1

Reputation: 361595

Do you have a / at the beginning of the Disallow: line?

User-agent: googlebot
Disallow: /project/*/download/pdf

Upvotes: 5

Related Questions