Reputation: 712

Will / in robots.txt also apply to the root directory?

In website.com/path/ there is a robots.txt file, that contains the following:

User-agent: *
Disallow: /

I do NOT want it to apply for website.com, but only to the path itself.

The question is: does / actually mean ./ or does it refer to the web root folder?

The reasoninng: I don't want to give the list of folders in the robots.txt, but if a crawler gets to a private from some external link, it should not index it.

Upvotes: 0

Answers (3)

unor

Reputation: 96687

Your robots.txt MUST be placed in the host root, you can't have a robots.txt in example.com/path/robots.txt.

So you have to move your robots.txt one level up, to example.com/robots.txt. And now it's clear that Disallow: / blocks everything on this host.

If you don't want to give information about your "private" URLs, you could specify only the beginning of those URLs (if possible in your case):

User-agent: *
Disallow: /p

This would block all URLs that start with example.com/p, like:

example.com/p
example.com/p.html
example.com/path
example.com/path/
example.com/path/foobar
example.com/p12asokd1

If this is not possible (e.g. if your public URLs might start with such characters, too), you could use the robots meta element instead.

Note that when using robots.txt to block URLs, search engines might still index your URLs and link to it in their search results (e.g. when someone links to your private URLs). So these URLs aren't so "private" anymore. When using the meta way, (polite) search engines won't even index the URL, so that would be an advantage for you.

Upvotes: 3

Jim Mischel

Reputation: 134035

You might try Disallow: /*/, which will block anything that has a path and a slash. That will block /foo/bar.html, but it won't block /index.html in the root.

Unfortunately, it won't block /foo, although depending on your web server, requests for /foo might be redirected to /foo/, which is blocked.

Upvotes: 1

bizna

Reputation: 712

Sadly enough, it will also apply to the root folder.

Actually, every robots.txt applies first of all to the root folder, and only then you can give details regardding specific folders.

From robotstxt.org:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

Upvotes: 0

Will / in robots.txt also apply to the root directory?

Answers (3)

Related Questions