Reputation: 1225
I came across a site that uses the following in its robots.txt file:
User-agent: *
Disallow: /*.php$
So what does it do? Will it prevent web crawlers from crawling the following URLs?
https://example.com/index.php
https://example.com/index.php?page=Events&action=Upcoming
Will it block subdomains too?
https://subdomain.example.com/index.php
Upvotes: 1
Views: 253
Reputation: 759
It looks like regular expressions but regular expressions are not in the spec. But Google and Bing both honours wildcards (*) and end-of-url markers ($). You can try your robots.txt rules here.
Upvotes: 2
Reputation: 944054
So what does it do?
By spec it means "URLs starting with /*.php$
", which isn't very useful. There might be engines out that which support some custom syntax for it. I know some support wild cards, but that looks like regular expression syntax and I've not heard of anything that supports that in robots.txt.
Will it prevent web crawlers from crawling the following URLs?
By spec: No.
If anything supports regexs, then it will block the first one but not the second one.
Will it block subdomains too?
No. Each origin is independent when it comes to robots.txt. The subdomain site would need its own copy of the resource.
Upvotes: 4