Reputation: 977
I am trying to access elements within an HTML file in Android. I have retrieved the document using volley (stringRequest) and am now trying to parse the document using JSOUP.
The HTML document has some code within it as follows:
<div class="theProducts">
<h3>
<a href="http://www.myproduct.com/myproduct.html" >
This is the product information I want to access
<img src="http://prettypictures.myproduct.com/myproduct.jpg" alt="" />
</a>
</h3>
</div>
I am able to access 'theproducts' contained within the document by doing the following:
Document doc = Jsoup.parse(response);
String title = doc.title();
Elements productElements = doc.getElementsByClass("theProducts");
for (Element productElement : productElements) {
//String name = productElement.attr("name");
//String content = productElement.attr("content");
}
So, I do receive an array of productElements quite happily. I am however not sure how to access the specific element I want (i.e. 'This is the product I want to access'). I can see it nested within the array but it's deeply nested.
Is anyone please able to explain to me the correct syntax to use. I'm not all that familiar with the DOM model thus am getting a little confused. I did try doc.getElementsByClass(theProducts.h3) and (theProducts#h3) but neither of these worked and instead I got 0 results.
I also tried to access outerHtml however this returns me the entire <h3>
section.
Any help is greatly appreciated.
Upvotes: 0
Views: 1312
Reputation: 5940
The easy way to get elements you want is
Elements els = doc.select("div.theProducts>h3>a");
for(Element el : els) {
System.out.println(el.text());
}
Here first line doc.select("div.theProducts>h3>a")
will give all the div tags with class theProducts
and having h3 and child and anchor as child of h3 element.
EDIT::For more details about selector tags
read this link
Upvotes: 1
Reputation: 977
A bit more searching and I found the answer here:
Parse the inner html tags using jSoup
I'll go and upvote it now!
Posting the answer here in the context of my question (as found on that page)...
Elements headlinesCat1 = doc.getElementsByTag("h3");
for (Element headline : headlinesCat1) {
Elements importantLinks = headline.getElementsByTag("a");
for (Element link : importantLinks) {
String linkHref = link.attr("href");
String linkText = link.text(); //THIS IS THE TEXT I WANTED...
System.out.println(linkHref);
}
}
Upvotes: 0