Reputation: 712
In website.com/path/ there is a robots.txt file, that contains the following:
User-agent: *
Disallow: /
I do NOT want it to apply for website.com, but only to the path itself.
The question is: does / actually mean ./ or does it refer to the web root folder?
The reasoninng: I don't want to give the list of folders in the robots.txt, but if a crawler gets to a private from some external link, it should not index it.
Upvotes: 0
Views: 1197
Reputation: 96687
Your robots.txt MUST be placed in the host root, you can't have a robots.txt in example.com/path/robots.txt
.
So you have to move your robots.txt one level up, to example.com/robots.txt
. And now it's clear that Disallow: /
blocks everything on this host.
If you don't want to give information about your "private" URLs, you could specify only the beginning of those URLs (if possible in your case):
User-agent: *
Disallow: /p
This would block all URLs that start with example.com/p
, like:
example.com/p
example.com/p.html
example.com/path
example.com/path/
example.com/path/foobar
example.com/p12asokd1
If this is not possible (e.g. if your public URLs might start with such characters, too), you could use the robots
meta
element instead.
Note that when using robots.txt
to block URLs, search engines might still index your URLs and link to it in their search results (e.g. when someone links to your private URLs). So these URLs aren't so "private" anymore. When using the meta
way, (polite) search engines won't even index the URL, so that would be an advantage for you.
Upvotes: 3
Reputation: 134035
You might try Disallow: /*/
, which will block anything that has a path and a slash. That will block /foo/bar.html
, but it won't block /index.html
in the root.
Unfortunately, it won't block /foo
, although depending on your web server, requests for /foo
might be redirected to /foo/
, which is blocked.
Upvotes: 1
Reputation: 712
Sadly enough, it will also apply to the root folder.
Actually, every robots.txt applies first of all to the root folder, and only then you can give details regardding specific folders.
From robotstxt.org:
When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.
For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".
Upvotes: 0