Abhishek Dilliwal
Abhishek Dilliwal

Reputation: 13

Identifying a Search Engine Crawler

I am working on a website which loads its data via AJAX. I also want that the whole website can be crawled by search engines like google and yahoo. I want to make 2 versions of the site... [1] When a user comes the hyperlinks should work just like GMAIL (#'ed hyperlinks) [2] When a crawler comes the hyperlinks should work normally (AJAX mode off)

How can i identify a Crawler??

Upvotes: 1

Views: 720

Answers (4)

Quentin
Quentin

Reputation: 943537

This approach just makes life difficult for you. It requires you to maintain two completely separate versions of the site and try to guess what version to serve to any given user. Search engines are not the only user agents that don't have JavaScript available and enabled.

Follow the principles of unobtrusive JavaScript and build on things that work. This avoids the need to determine which version to give to a user since the JS can gracefully fail while leaving a working HTML version.

Upvotes: 0

Brian Campbell
Brian Campbell

Reputation: 332826

You should not present a different form of your website to your users and a crawler. If Google discovers you doing that, they may reduce your search ranking because of it. Also, if you have a version that's only for a crawler, it may break without you noticing, thus giving search engines bad data.

What I'd recommend is building a version of your site that doesn't require AJAX, and having prominent links on each page to the non-AJAX version. This will also help users who may not like the AJAX version, or who have browser which aren't capable of handling it properly.

Upvotes: 0

Mike Axiak
Mike Axiak

Reputation: 12004

Crawlers can usually be identified with the User-Agent HTTP Header. Look at this page for a list of user agents for crawlers specifically. Some examples are:

Google:

Also, here are some examples for getting the user agent string in various languages:

PHP:
$_SERVER['HTTP_USER_AGENT']

Python Django:
request.META["HTTP_USER_AGENT"]

Ruby On Rails:
request.env["HTTP_USER_AGENT"]

...

Upvotes: 1

Paul Rubel
Paul Rubel

Reputation: 27222

The http headers of the crawler should contain a User-Agent field. You can check this field on your server.

Here is a list of TONS of User-Agents. Some examples:

Google robot 66.249.64.XXX ->
Googlebot/2.1 ( http://www.googlebot.com/bot.html)       

Harvest-NG web crawler used by search.yahoo.com 
Harvest-NG/1.0.2     

Upvotes: 0

Related Questions