henry.oswald
henry.oswald

Reputation: 5434

robots txt disallow wild card

I am having trouble stopping google crawling a few urls which cause errors.

I want to stop

but allow

I tried project/*/download/pdf but it doesn't seem to work. Does anyone know what would?

Upvotes: 4

Views: 4991

Answers (2)

unor
unor

Reputation: 96577

The original robots.txt specification doesn't define any wildcards, but Google (and some others) added them to their parsers. However, I guess you don't need them for your case anyway (as Jim noted, this was wrong). The following robots.txt (using the * wildcard) should do the job:

User-agent: Googlebot
Disallow: /project/*/download

Upvotes: 1

John Kugelman
John Kugelman

Reputation: 361595

Do you have a / at the beginning of the Disallow: line?

User-agent: googlebot
Disallow: /project/*/download/pdf

Upvotes: 5

Related Questions