Reputation: 580
I'm using JSoup to scrape a webpage. Can anybody help me out or point me in the right direction for how to parse the text that is contained in this link. presently I'm running a for each loop and it will iterate through the elements but won't find the link and stops after 1 iteration.
the HTML..
<div>
<div style = a bunch of different inline styles here>
<div class = "_6d3hm _mnav9">
<div class = "_mck9w _gvoze _tn0ps">
<a href= "the link i want">_</a>
</div>
<div class = "_mck9w _gvoze _tn0ps">
<a href= "another link i want">_</a>
</div>
<div class = "_mck9w _gvoze _tn0ps">
<a href= "another link i want">_</a>
</div>
</div>
<div class = "_6d3hm _mnav9">
<div class = "_mck9w _gvoze _tn0ps">
<a href= "the link i want">_</a>
</div>
<div class = "_mck9w _gvoze _tn0ps">
<a href= "another link i want">_</a>
</div>
<div class = "_mck9w _gvoze _tn0ps">
<a href= "another link i want">_</a>
</div>
This is my java using Soup. I've experimented with a bunch of different tags...
for (Element row : doc.select("div")) {
System.out.println("iterating");
final String link = row.getElementsByTag("._mck9w _gvoze _tn0ps").text();
System.out.println(link);
}
Does anybody have an idea how I can scrape every link i've mentioned in the HTML???
Upvotes: 0
Views: 107
Reputation: 2246
The error is in this line: row.getElementsByTag("._mck9w _gvoze _tn0ps")
.
You are looking for tags a
and its attribute href
, so your code should look like this:
for (Element row : doc.select("div")) {
System.out.println("iterating");
final String link = row.getElementsByTag("a").attr("href");
System.out.println(link);
}
If you want to use the fact, that div has class attribute with given values you can try something like this:
for(Element e: doc.select("div._mck9w._gvoze._tn0ps > a")) {
System.out.println(e.attr("href"));
};
Upvotes: 2