Reputation: 19001
Is there any good solution or some Headless Browser I can use on GAE ? I working on an application, on GAE, where the application is going to read some web pages, parse them, and do some statistics on them. There is discussion going here, to make HTMLUnit working on GAE but I am not sure if it is going to work anyway.
Upvotes: 1
Views: 275
Reputation: 20961
If you're okay with just getting the HTML (and not executing Javascript), jsoup.org might be worth a look:
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");
(sample code shamelessly copied from jsoup)
Upvotes: 1