Reputation: 31
I was wondering - can a crawler be written entirely in javascript? That way, the crawler is only called when a user needs the information and everything is run from the individual user's computer.
if crawler is written server side - doesn't that also run the risk of the IP being blocked?
Upvotes: 3
Views: 2280
Reputation: 91
There are ways to deal with the cross domain problem. Search for "Access-Control-Allow-Origin" and you'll see how.
The easiest way to implement such a crawler is to write addon(firefox) or extension(chrome), then inject your javascript code into the visited page. That way, you'll see exactly the same thing as the document author sees. You can simply call document,body.innerText, then post the content to your server for indexing.
I myself have such a crawler working, with several browsers on different ip address crawling.
Upvotes: 2
Reputation: 231113
It's possible to write a crawler in javascript, using, for example, Node.JS. However, you probably won't be able to write one in a user's browser. This is because:
Upvotes: 2
Reputation: 38603
First off, before talking details, you must understand that crawling is extremely slow. Getting any kind of meaningful web indexing takes minutes if you're looking on one site, and days at the very least if you're looking into multiple sources (often weeks, months or years). Serving a search by crawling live is not viable at all.
As for details, there's nothing preventing one from writing a crawler in Javascript. However, not in Browser-embedded javascript, at least not without a server-side proxy due to the cross-origin policy.
Upvotes: 2