Lyall
Lyall

Reputation: 1437

Which robots.txt for forwarded subdomain?

In theory I have two subdomains set up in my hosting:

subdomain1.mydomain.com

subdomain2.mydomain.com

subdomain2 has a CNAME record pointing to an external service.

mydomain.com has a robots.txt that allows indexing everything.

subdomain2.mydomain.com has a robots.txt that allows indexing nothing due to the CNAME record.

If I set up a forward from subdomain1.mydomain.com to subdomain2.mydomain.com, which robots.txt would be used if accessing a link to subdomain1.mydomain.com? Does the domain forward work in the same way as a CNAME record when it comes to robots.txt?

Upvotes: 0

Views: 510

Answers (2)

user65839
user65839

Reputation:

The challenge you're running into is you're looking at things from the standpoint of whatever software you're trying to configure, but search engines and other robots only see the document they load from a URL (just like any other user with a web browser would). That is, search engines will try to load http://subdomain1.mydomain.com/robots.txt and http://subdomain2.mydomain.com/robots.txt, and it's up to you (through configuring whatever software your server is running) to ensure that those are in fact serving what you want.

A CNAME is just a way to add a redirection when loading what IP a browser should look at to resolve a domain name. A robot will use it when resolving the name to find out the "real" IP to connect to, but it doesn't have any further bearing on what the GET /robots.txt request does once it connects to the server.

In terms of "forwarding", that term can mean different things, so you'd need to know what a browser or robot would receive when it requested the page. If it's doing a 301 or 302 redirection to send the client to another URL, you'll probably get different results from different search engines on how they may honor that, particularly if it's being redirected to an entirely different domain. I probably would try to avoid it, just because a lot of robots are poorly written. Some search engines have tools to help you determine how their crawlers are reading your robots.txt URLs, such as Google's tool.

Upvotes: 1

Kaz Wolfe
Kaz Wolfe

Reputation: 438

This depends on your server setup.

Take the following config, for example:

server {
    server_name subdomainA.example.com;
    listen 80;

    return 302 http://subdomainB.example.com$request_uri;
}

In this case, we're redirecting everything from subdomainA.example.com to subdomainB.example.com. This will include your robots.txt file.

However, if your configuration is set up to only redirect certain parts, your robots.txt file will only be redirected if it's on your list. This would be the case if you were redirecting only, say, /someFolder.

Note that if you don't return a 302 but just use a different root (e.g. subdomainA and subdomainB are different subdomains but serve the same content), your robots.txt content will be determined by the root directory.

So, therefore, if I'm understanding your config correctly, subdomain1 will use the the robots.txt from subdomain2.

Upvotes: 1

Related Questions