Reputation: 97
I seem to be struggling with a robots.txt file in the following scenario. I would like all root folder *.php files to be indexed except for one (exception.php) and would like all content from all subdirectories of the root folder not to be indexed.
I have tried the following, but it allows accessing php files in subdirectories even though subdirectories in general are not indexed?
....
# robots.txt
User-agent: *
Allow: /*.php
disallow: /*
disallow: /exceptions.php
....
Can anyone help with this?
Upvotes: 1
Views: 2057
Reputation: 96567
For crawlers that interpret *
in Disallow
values as wildcard (it’s not part of the robots.txt spec, but many crawlers support it anyway), this should work:
User-agent: *
Disallow: /exceptions.php
Disallow: /*/
This disallows URLs like:
https://example.com/exceptions.php
https://example.com//
https://example.com/foo/
https://example.com/foo/bar.php
And it allows URLs like:
https://example.com/
https://example.com/foo.php
https://example.com/bar.html
For crawlers that don’t interpret *
in Disallow
values as wildcard, you would have to list all subfolders (on the first level):
User-agent: *
Disallow: /exceptions.php
Disallow: /foo/
Disallow: /bar/
Upvotes: 1