NL500
NL500

Reputation: 1225

What does /*.php$ mean in robots.txt?

I came across a site that uses the following in its robots.txt file:

User-agent: *
Disallow: /*.php$

So what does it do? Will it prevent web crawlers from crawling the following URLs?

https://example.com/index.php
https://example.com/index.php?page=Events&action=Upcoming

Will it block subdomains too?

https://subdomain.example.com/index.php

Upvotes: 1

Views: 253

Answers (2)

Niklas Brunberg
Niklas Brunberg

Reputation: 759

It looks like regular expressions but regular expressions are not in the spec. But Google and Bing both honours wildcards (*) and end-of-url markers ($). You can try your robots.txt rules here.

Upvotes: 2

Quentin
Quentin

Reputation: 944054

So what does it do?

By spec it means "URLs starting with /*.php$", which isn't very useful. There might be engines out that which support some custom syntax for it. I know some support wild cards, but that looks like regular expression syntax and I've not heard of anything that supports that in robots.txt.

Will it prevent web crawlers from crawling the following URLs?

By spec: No.

If anything supports regexs, then it will block the first one but not the second one.

Will it block subdomains too?

No. Each origin is independent when it comes to robots.txt. The subdomain site would need its own copy of the resource.

Upvotes: 4

Related Questions