Block URI segments on single-page app with robots.txt

Question

I have a single-page app built with AngularJS to show content dynamically from a REST API based on the first URI parameter.

How can I block Bots from crawling anything but the home page and login page?

An example url would be:

http://example.com/CLIENT01

I have searched for examples and tried wildcarding using the following:

User-agent: *
Disallow: /*

Allow: /login

But this is not valid. I also cannot use meta tags in the html as the page content is loaded dynamically after the header and footer.

Any ideas would be much appreciated!

eywu · Accepted Answer

This should satisfy your use case, however, I'm not sure if this is exactly what you want.

User-agent: *
Disallow: /
Allow: /$
Allow: /login

The Disallow: line stops crawlers from crawling anything. This is the most aggressive command.

Then the first Allow: grants crawlers the ability to just get the homepage, but nothing else. Since we're using the $ to end the path, any query parameters or files that sit off root will not be crawled. If you want to allow query parameters in, you can add this as well:

Allow: /?

The final allow statement will allow your login page to be crawled, but honestly most people don't allow their login page to be crawled because it usually doesn't have content that you'll really rank for. But it's perfectly acceptable since there are some edge cases where users will look for a login page if it's not apparent from the homepage.

Block URI segments on single-page app with robots.txt

Answers (1)

Related Questions