Reputation: 1115
I am using jsoup to parse webpage using the following command
Document document = Jsoup.connect("http://www.blablabla.de/").get();
then
System.out.println(document.toString());
I get the desired result. But saving the subject webpage and then trying to do the same operation
Document doc = Jsoup.parse("/user/test/test.html","UTF-8");
System.out.println(doc.toString());
I got
html
head head
body
/home/1.html
body
html
My second issue is that I want to get the content of every single div of a specific class. I am using
Elements elements = document.select("div.things.subthings");
the divs I want to catch are as follows
<div class="col_a col text">
<div class="text">
done
</div>
</div>
Upvotes: 2
Views: 55
Reputation: 43033
But saving the subject webpage and then trying to do the same operation
The wrong method is called. Actually, the method called is this one:
static Document Jsoup::parse(String html, String baseUri) // Parse HTML into a Document.
You want to call this one:
static Document parse(File in, String charsetName) // Parse the contents of a file as HTML.
Try this instead:
Document doc = Jsoup.parse(new File("/user/test/test.html"), "UTF-8");
System.out.println(doc.toString());
My second issue is that I want to get the content of every single div of a specific class.
Try one of the css queries below:
For finding all divs with class="col_a col text"
div.col_a.col.text
For finding all divs with class="col_a col text"
OR class="text"
div.col_a.col.text, div.text
For finding all divs with class="col_a col text"
having divs with class="text"
among their descendants
div.col_a.col.text:has(div.text)
Upvotes: 2