Reputation: 117
I need to get a pattern matching rule to get this results.
allow /dir/path_name.htm/something
disallow /dir/path_name/something
and disallow /dir/path_name.htm
Actually those two disallows are typos accumulated all along. Those pages never exist. How to stop google crawling them never again?
I tested here: http://www.frobee.com/robots-txt-check/ with the following, but seems nothing working.
Allow: /dir/*.htm/?*
Disallow: /dir/*
What went wrong? Thank you.
Upvotes: 3
Views: 218
Reputation: 99523
According to the spec:
http://www.robotstxt.org/norobots-rfc.txt
Wildcards (*
) are not allowed. The paths are just exact matches. My guess is that you're using some form of rewriting and you don't want multiple ulrs with the same content to show up. In that case this may be a better solution:
http://googlewebmastercentral.blogspot.de/2009/02/specify-your-canonical.html
Upvotes: 1