Reputation: 12363
I am trying to parse the html of the following URL:
http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/
to obtain the text of the "< p >" tag which contains the name of an instructor. The required information is located inside "< p >" tags but I am unable to retrieve the tags using JSoup. I have no idea what I am doing wrong because when I save the tag in an Element object lets call it 'b' and I call b.getAllElements() it doesn't show
as one of the elements. Isn't that what the getAllElements() method of Jsoup does? If not could someone please explain to me the hierarchy that I am obviously missing as the parser is not able to locate the
tag which contains the text that I require which in this case is "Prof. Zoltan Spakovszky".
Any help would be greatly appreciated.
public void getHomePageLinks()
{
String html = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/";
org.jsoup.nodes.Document doc = Jsoup.parse(html);
Elements bodies = doc.select("body");
for(Element body : bodies )
{
System.out.println(body.getAllElements());
}
}
the output is:
http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/
isn't it supposed to print out all the elements within the body tag in the document?
Upvotes: 1
Views: 21419
Reputation: 64
Elements ele=doc.select("p");
' String text=ele.text();
System.out.println(text);
Try this I think it will work
Upvotes: 0
Reputation: 145
may be u already solved but i worked on it so cant resist to submit
import java.io.IOException;
import java.util.logging.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class JavaApplication17 {
public static void main(String[] args) {
try {
String url = "http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy- fall-2002/";
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select("p");
for(Element p : paragraphs)
System.out.println(p.text());
}
catch (IOException ex) {
Logger.getLogger(JavaApplication17.class.getName())
.log(Level.SEVERE, null, ex);
}
}
}
is it what u meant?
Upvotes: 3
Reputation: 271
Here is a code
Document document = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();
Elements elements = document.select("p");
System.out.println(elements.html());
You can select all tags using Selector property of Jsoup. It will return the text and tags of
.
Upvotes: 1
Reputation: 25380
Here's a short example:
// Connect to the website and parse it into a document
Document doc = Jsoup.connect("http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-050-thermal-energy-fall-2002/").get();
// Select all elements you need (se below for documentation)
Elements elements = doc.select("div[class=chpstaff] p");
// Get the text of the first element
String instructor = elements.first().text();
// eg. print the result
System.out.println(instructor);
Take a look at the documentation of the jsoup selector api here: Jsoup Codebook
Its not very difficult to use but very powerful.
Upvotes: 2
Reputation: 397
I don't know anything about JSoup, but it seems like if you wanted the instructors name you could access it with something like:
Element instructor = doc.select("div.chpstaff div p");
Upvotes: 3