EyfI
EyfI

Reputation: 1005

Extract text from only some divs in the same class with jsoup

I would like to extract a text from specific <div> of a website using jsoup, but I'm not sure how.

The problem is, that I want to get a text from div that has a class="name".

But, there can be more <div>s with this class (and I don't want to get the text from those).

It looks like this in the HTML file:

.  
.
<div class="name">
Some text I don't want
<span class="a">Tree</span>
</div>
.  
.
<div class="name">Some text I do want</div>
.  
.

So the only difference there is that the <div> I want the text from does not have <span> inside of it. But I have not found a way to use that as a key to extract the text in jsoup.

Is it possible?

Upvotes: 0

Views: 890

Answers (2)

Hovercraft Full Of Eels
Hovercraft Full Of Eels

Reputation: 285403

Use JSoup's selector syntax. For instance to select all div's with class = "name" use

Elements nameElements = doc.select("div.name");

Note that your text you "do" and "don't" want above are in the same relative HTML locations, and in fact I have no clue why you want one or the other. HTML and JSoup will see them the same.

If you want to avoid elements containing span elements, then one way is to iterate through the elements obtained above and test by selector if they have span elements or not:

    Elements nameElements = doc.select("div.name");

    for (Element element : nameElements) {
        if (element.select("span").isEmpty()) {
            System.out.println("No span");
            System.out.println(element.text());
            System.out.println();
        } else {
            System.out.println("span");
            System.out.println(element.text());
            System.out.println();
        }
    }

Upvotes: 1

Andrei Volgin
Andrei Volgin

Reputation: 41089

You can select all div elements with class="name", and then loop through them. Check if an element has child elements - if not, this is the div you want.

Upvotes: 0

Related Questions