Reputation: 11
I am working on android application to get some data from html webpage and parse it to be used in the application. I tried to use Web-harvest, but it seems not fully compatible with android. The Application should get the webpage, parse it, get the needed data, and use it in the app. so whats the standard and recommended way to scrape html pages in android ?
Upvotes: 0
Views: 904
Reputation: 488
I've been happy with using TagSoup and XOM to parse webpages on Android. With both in your classpath, you'd do something like:
XMLReader tagsoup = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Builder bob = new Builder(tagsoup);
Document html = bob.build("http://www.yahoo.com");
Nodes images = html.query("//img");
for (int index = 0; index < images.size(); index++) {
Element image = (Element) images.get(index);
String src = image.getAttribute("src").getValue();
// do something with it...
}
If the HTML you're scraping has a namespace, you'd do the below instead:
XPathContext context = new XPathContext("html", "http://www.w3.org/1999/xhtml");
Nodes images = html.query("//html:img", context);
Links:
XOM --> http://www.xom.nu
TagSoup --> http://ccil.org/~cowan/XML/tagsoup/
Of course, you'll have to catch possible exceptions on building the XML document from the Web page.
Upvotes: 1