user903772
user903772

Reputation: 1562

Web crawler in groovy and grails to crawl everyday

I need to implement a web crawler to crawl to a website to get data everyday. What is the best way to do this? should I write a groovy script and let it repeat everyday? If I use script, I can't use domain classes.

Any suggestion?

Upvotes: 1

Views: 1799

Answers (2)

Jan Weitz
Jan Weitz

Reputation: 454

I would creates service and schedule it via Quartz with a Cron job.

The service itself should use selenium for crawling. Depending on which sites you need to crawl, you might have to check how good a browser you need for JavaScript support. May e Htmlunit in Selenium won't cut it.

Therefore you need to make sure that you can install Firefox or Chrome on your Grails machine. To take this even further and separate the browser installation from your server, you can use anther machine as a Selenium Grid Node and the server as a grid hub, where all nodes connect. Your Grails service now does not need the FirefoxDriver or ChromeDriver to crawl, but the RemoteDriver, which talks to your Selenium Node, instead.

Maybe, to decouple the feedback of the crawler and your Grails application, you might want to use a Messaging System. APMQ in combination with Apache Camel will get you very far. If you use Camel, check out how Camel can help you with Quartz

Upvotes: 0

sbglasius
sbglasius

Reputation: 3124

I'd suggest using XmlSlurper to read data from the site, make it in a service in Grails, and use the Quartz plugin to schedule it. That way you have access to the domain model in Grails, and you can use the coolness of the slurper to fetch the HTML. You might also need a parser like Nekohtml http://nekohtml.sourceforge.net.

Upvotes: 5

Related Questions