likeforex.com
likeforex.com

Reputation: 117

robots.txt pattern matching not working

I need to get a pattern matching rule to get this results.

allow /dir/path_name.htm/something
disallow /dir/path_name/something
and disallow /dir/path_name.htm

Actually those two disallows are typos accumulated all along. Those pages never exist. How to stop google crawling them never again?

I tested here: http://www.frobee.com/robots-txt-check/ with the following, but seems nothing working.

Allow: /dir/*.htm/?*
Disallow: /dir/*

What went wrong? Thank you.

Upvotes: 3

Views: 218

Answers (1)

Evert
Evert

Reputation: 99523

According to the spec:

http://www.robotstxt.org/norobots-rfc.txt

Wildcards (*) are not allowed. The paths are just exact matches. My guess is that you're using some form of rewriting and you don't want multiple ulrs with the same content to show up. In that case this may be a better solution:

http://googlewebmastercentral.blogspot.de/2009/02/specify-your-canonical.html

Upvotes: 1

Related Questions