Reputation: 2364
What is the best way to scrape the below HTML from a web page? I want to pull out Apple, Orange and Grape and put them into a dropdown menu in my Android app. Should I use Jsoup for this, and if so, what would be the best way to do it? Should I use Regex instead?
<select name="fruit" id="fruit" >
<option value="APPLE">Apple</option>
<option value="ORANGE">Orange</option>
<option value="GRAPE">Grape</option>
</select>
Upvotes: 5
Views: 5628
Reputation: 1138
Depends, but I'd go with an XML/HTML parser. Don't use regex.
Example with jsoup:
Document doc = Jsoup.connect(someUrl).get();
Elements options = doc.select("select#fruit option");
More on jsoup selector syntax.
I would go with either the built-in DOM parser or SAX parser. If you're going to be parsing a large document, SAX is faster. If the document is small, then there's not much difference. More on SAX vs DOM.
Upvotes: 14
Reputation: 1281
For HTML parsing you can use jsoup. The usage is very easy and the API is great.
For me it worked great!
EDIT: too slow :D skyuzo's post is great :)
Upvotes: 2
Reputation: 15219
WebView is your friend:
http://developer.android.com/reference/android/webkit/WebView.html
It let's you grab html as a browser, and then you can do stuff with it. Take notice that it doensn't take into account javascript, so I hope that's plain html you have therem not some ajax fetched or js generated code :)
Upvotes: 1