burak emre
burak emre

Reputation: 1591

How can I make an indexable website that uses Javascript router?

I have been working on a project that uses Backbone.js router and all data is loaded by javascript via restful requests. I know that there is no way to detect whether Javascript is enabled or not in server-side but here is the scenarios that I thought to make this website indexable:

  1. I can append a query string for each link on sitemap.xml and I can put a <script> tag to detect whether Javascript is enabled or not. The server renders this page with indexable data and when a user visits this page I can manually initialize Backbone.js router. However the problem is I need to execute an sql query to render indexable data in server-side and it will cause an extra load if the visitor is not a bot. And when users share an url of the website somewhere, it won't be an indexable page and web crawlers may not identify the content of that url. And an extra string in web crawler's search page may be annoying for users.

  2. I can detect popular web crawlers like Google, Yahoo, Bing, Facebook in server-side from their user-agents but I suspect that there will be some web crawlers that I missed.

Which way seems more convenient or do you have any idea & experience to make indexable this kind of websites?

Upvotes: 2

Views: 1077

Answers (1)

machineghost
machineghost

Reputation: 35795

As elias94xx suggested in his comment, one solid solution to this dilemma is to take advantage of Google's "AJAX crawling". In short Google told the web community "look we're not going to actually render your JS code for you, but if you want to render it server-side for us, we'll do our best to make it easy on you." They do that with two basic concepts: pretty URL => ugly URL translation and HTML snapshots.

1) Google implemented a syntax web developers could use to specify client-side URLs that could still be crawled. This syntax for these "pretty URLs", as Google calls them, is: www.example.com?myquery#!key1=value1&key2=value2.

When you use a URL with that with that format, Google won't try to crawl that exact URL. Instead, it will crawl the "ugly URL" equivalent: www.example.com?myquery&_escaped_fragment_=key1=value1%26key2=value2. Since that URL has a ? instead of a # this will of course result in a call to your server. Your server can then use the "HTML snapshot" technique.

2) The basics of that technique is that you have your web-server run a headless JS runner. When Google requests an "ugly URL" from your server, the server loads up your Backbone router code in the headless runner, and it generates (and then returns to Google) the same HTML that code would have generated had it been run client-side.


A full explanation of pretty=>ugly URLs can be found here: https://developers.google.com/webmasters/ajax-crawling/docs/specification

A full explanation of HTML snapshots can be found here: https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot

Oh, and while everything so far has been based on Google, Bing/Yahoo also adopted this syntax, as indicated by Squidoo here:
http://www.squidoo.com/ajax-crawling

Upvotes: 4

Related Questions