FarFarAway
FarFarAway

Reputation: 1115

jsoup unexpected behaviour and find all div for a class

I am using jsoup to parse webpage using the following command

Document document = Jsoup.connect("http://www.blablabla.de/").get();

then

System.out.println(document.toString());

I get the desired result. But saving the subject webpage and then trying to do the same operation

Document doc = Jsoup.parse("/user/test/test.html","UTF-8");
System.out.println(doc.toString());

I got

html
head head
body
/home/1.html
body
html

My second issue is that I want to get the content of every single div of a specific class. I am using

Elements elements = document.select("div.things.subthings");

the divs I want to catch are as follows

<div class="col_a col text">
    <div class="text">
     done
    </div>
</div>

Upvotes: 2

Views: 55

Answers (1)

Stephan
Stephan

Reputation: 43033

But saving the subject webpage and then trying to do the same operation

The wrong method is called. Actually, the method called is this one:

static Document Jsoup::parse(String html, String baseUri) // Parse HTML into a Document.

You want to call this one:

static Document parse(File in, String charsetName) // Parse the contents of a file as HTML.

Try this instead:

Document doc = Jsoup.parse(new File("/user/test/test.html"), "UTF-8");
System.out.println(doc.toString());

My second issue is that I want to get the content of every single div of a specific class.

Try one of the css queries below:

For finding all divs with class="col_a col text"

div.col_a.col.text

For finding all divs with class="col_a col text" OR class="text"

div.col_a.col.text, div.text

For finding all divs with class="col_a col text" having divs with class="text" among their descendants

div.col_a.col.text:has(div.text)

Upvotes: 2

Related Questions