user3344151
user3344151

Reputation: 11

block search engine crawling directory

My site have URL (longest) structure like this

http://www.example.com/xyz-pqr/abcd-efgh/123.html

So there is maximum of 3 directory level, but because of CMS and other problem my site is getting indexed in search engine for more than 3 directory level url like,

http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/xyz-pqr/abcd-efgh/abcd-efgh/123.html

I want to write code in robots.txt so that, search engine will never crawl more than 3 directory level. How do I do this? Thanks in advance...

Upvotes: 0

Views: 283

Answers (1)

Jim Mischel
Jim Mischel

Reputation: 133995

I'm not certain, but I think the following should work:

User-agent: *
Disallow: /*/*/*/

So, given these two URLs:

http://www.example.com/xyz-pqr/abcd-efgh/123.html
http://www.example.com/xyz-pqr/abcd-efgh/foo-bar/123.html

The first would be accepted because it has only two directory segments (/xyz-pqr-abcd-efgh).

The second would be blocked because it has three directory segments.

And anything longer would be blocked, as well.

Upvotes: 1

Related Questions