Ray
Ray

Reputation: 2728

Why use robot.txt on javascript files?

Is there any reason you should or shouldn't allow access to javascript or css files? Specifically common files such as jquery.

Upvotes: 4

Views: 4239

Answers (2)

MrWhite
MrWhite

Reputation: 45913

Ordinarily you should not (or have no need to) disallow access to JavaScript and CSS files in robots.txt.

However, search engines (specifically Google) are getting increasingly better at indexing JavaScript generated content. In most cases this is a good thing. On the other hand JavaScript has also been used to specifically hide content from the search engines, since it was believed that search engines did not execute JavaScript. This might not be the case anymore. However, it has been suggested that by disallowing these specific JavaScript files that generate the content in robots.txt then you also block the search engines from generating and seeing the hidden content - if that is the requirement.

This technique was suggested by seomofo in June 2010 with regards to blocking affiliate marketing links.

Upvotes: 1

eywu
eywu

Reputation: 2724

It's widely accepted that search engines allocate a certain amount of bandwidth or # of URLs to a given site per day. So some webmasters like to block JS, CSS, and boilerplate images from the search engines to conserve their bandwidth so Google or Bing will crawl more pages instead of unnecessary images.

Googler, Matt Cutts, has asked in the past that webmasters don't do this (http://www.seroundtable.com/googlebot-javascript-css-14930.html).

It appears that Google would like to know exactly how your site behaves, with and without javascript. There's plenty of evidence that they're rendering out the entire page, as well as, executing other javascript that is executed onPageLoad (e.g. Facebook comments).

If you block even common jQuery files, Google really doesn't know if it's a common jQuery implementation or if you've modified the core files, hence modifying the experience.

My suggestion would be to make sure all your JS, CSS, and boilerplate images are served off a separate domain or CNAME. I would monitor Googlebot's crawl through logs and Google Webmaster Tools, and observe whether or not they're spending a lot of time and bandwidth to crawl these assets. If not, then just let them keep crawling it.

As each site behaves differently, you could experiment and block some of the more heavily requested files that are sucking down a large amount of bandwidth ... and then observe to see if Google's "pages crawled" increases.

Upvotes: 8

Related Questions