Tim
Tim

Reputation: 99536

how to test if robots.txt works in a local web server on localhost?

I add a robots.txt file to the root directory of a local web server.

The url of the robots.txt file on the server is http://localhost/myserver/robots.txt.

The content of the robots.txt file is

User-agent: *
Disallow: /

How can I verify that the robots.txt file works for the local web server?

Do i need to install some web crawler or search engine locally and run it to verify that?

Thanks.

Upvotes: 4

Views: 5565

Answers (2)

yoyoyojoe
yoyoyojoe

Reputation: 133

You can always use services such as ngrok to serve your locally hosted app, and see if it passes a generic crawler such as SEO Site Checkup

For context, perhaps, it might help to understand what robots.txt is doing.

"When a site owner wishes to give instructions to web robots they place a text file called robots.txt in the root of the web site hierarchy",

"Robots.txt files are particularly important for web crawlers from search engines such as Google." (Source: Wiki)

So to extend what's presented here, I'm gathering that whether your robots.txt is working or not can only officially verified by the bot operators.

In other words, if you have a web crawler that will crawl http://localhost/myserver/, you should then be able to verify if your web crawler is detecting the robots.txt and honoring its instructions. However, no web crawlers will crawl your locally served site.

I think this makes sense and hope it helps.

Upvotes: 0

Darshan
Darshan

Reputation: 2333

How can I verify that the robots.txt file works for the local web server?

As far as I know, the robots.txt file doesn't stop crawlers from crawling your sites. It just insists not to. That means you cannot verify if those works are not.

Instead what you can and should verify is that crawlers are able to read your robots.txt when they visit your site. This you can ensure by following the conventions.

That means your robots.txt file should be present under the root path. If you are going to host your site under xyz domain, then http://xyz/robots.txt should be the location.

For more information, check this.

If your site is live, you can use any online tool to verify that the robots.txt is accessible. One such tool is this.

Upvotes: 4

Related Questions