Reputation: 685
I'm learning jsoup for use in java. First of all, I'm not really understanding what the difference is between jsoup "Elements" and jsoup "Element" and when to use each. Here's an example of what I'm trying to do. Using this url http://en.wikipedia.org/wiki/List_of_bow_tie_wearers#Architects I want to parse the text names under the category "Architects". I've tried this:
Document doc = null;
try {
doc = Jsoup.connect("http://en.wikipedia.org/wiki/List_of_bow_tie_wearers").get();
} catch (IOException e) {
}
Elements basics = doc.getElementsByClass("mw-redirect");
String text = basics.text();
System.out.println(text);
}
Here is the output:
run:
Franklin Roosevelt Arthur Schlesinger, Jr. Reagan administration University of Colorado at Boulder Eric R. Kandel Eugene H. Spafford Arthur Schlesinger, Jr. middle finger John Daly Sir Robin Day Today show Tom Oliphant Today show Harry Smith TV chef Panic! At The Disco Watergate Watergate Hillary Clinton Donald M. Payne, Jr. Franklin Roosevelt Baldwin–Wallace College Howard Phillips Twilight Sparkle Gil Chesterton Bertram Cooper Richard Gilmore Dr. Donald "Ducky" Mallard, M.D., M.E. Medical Examiner Brother Mouzone hitman Buckaroo Banzai Conan Edogawa Jack Point Waylon Smithers Franklin Roosevelt NFL Chronicle of Higher Education Evening Standard
I'm really just trying to learn the basics of traversing a HTML document but I'm having trouble with the jsoup cookbook as it is confusing for a beginner. Any help is appreciated.
Upvotes: 1
Views: 1548
Reputation: 171
Regarding your first question, the difference between Elements and Element is, as the names indicate, the number of items.
An object of type Element contains one HTML node. One of type Elements multiple.
If you take a look at the constructors in the api documentation for Element and Elements, it becomes rather obvious.
Now for the parsing part. In your code you are looking for "mw-redirect", wich is not enough. You need to first "navigate" to the correct section.
I've made a working sample here:
Document doc = null;
try {
doc = Jsoup.connect("http://en.wikipedia.org/wiki/List_of_bow_tie_wearers").get();
} catch (IOException e) {
}
if (doc != null) {
// The Architect headline has an id. Awesome! Let's select it.
Element architectsHeadline = doc.select("#Architects").first();
// For educational purposes, let's see what we've got there...
System.out.println(architectsHeadline.html());
// Now, we use some other selector then .first(), since we need to
// get the tag after the h3 with id Architects.
// We jump back to the h3 using .parent() and select the succeding tag
Element architectsList = architectsHeadline.parent().nextElementSibling();
// Again, let's have a peek
System.out.println(architectsList.html());
// Ok, now since we have our list, let's traverse it.
// For this we select every a tag inside each li tag
// via a css selector
for(Element e : architectsList.select("li > a")){
System.out.println(e.text());
}
}
I hope this helps.
Upvotes: 4