Swader
Swader

Reputation: 11597

How do I only allow crawlers to visit a part of the site?

I've got an ajax rich website which has extensive _escaped_fragment_ portions for Ajax indexing. While all my _escaped_fragment_ urls do 301 redirects to a special module which then outputs the HTML snapshots the crawlers need (i.e. mysite.com/#!/content redirects to mysite.com/?_escaped_fragment_=/content which in turn 301s to mysite.com/raw/content), I'm somewhat afraid of users stumbling on those "raw" URLs themselves and making them appear in search engines.

In PHP, how do I make sure only robots can access this part of the website? (much like StackOverflow disallows its sitemap to normal users, and only lets robots access it)

Upvotes: 0

Views: 112

Answers (1)

Quentin
Quentin

Reputation: 944076

You can't, at least not reliably.

robots.txt asks spiders to keep out of parts of a site, but there is no equivalent for regular user agents.

The closest you could come would be to try to keep a whitelist of acceptable ip addresses or user agents and serve different content based on that … but that risks false positives.

Personally, I'd stop catering for old-IE, scrap the #! URIs and the escaped_fragment hack, switch to using pushState and friends, and have the server build the initial view for any given page.

Upvotes: 2

Related Questions