goat
goat

Reputation: 307

Jsoup - Extract from html string with the same span class name

I'm still new to html. For an Android project, I need to extract some data from an html string using jsoup. The structure is something like this. All the span tags have the same class name. And the data I need is in between each of those.

<span class="head">a</span>
xxxx data xxxx
<span class="head">b</span>
xxxx data xxxx
<span class="head">c</span>
xxxx data xxxx

Is there any way I could extract it?

Upvotes: 1

Views: 990

Answers (2)

Mohamed Nasser
Mohamed Nasser

Reputation: 56

Try this code working fine.

public class JsoupExample {

public static void main(String[] args) {
    String html = "<span class=\"head\">a</span>\n" +
            "xxxx data xxxx\n" +
            "<span class=\"head\">b</span>\n" +
            "xxxx data xxxx\n" +
            "<span class=\"head\">c</span>\n" +
            "xxxx data xxxx";

    Document document = Jsoup.parse(html);

    for (Element element: document.select("span.head")) {
        System.out.println(element.text());
    }
}

}

Upvotes: 0

Szymon Stepniak
Szymon Stepniak

Reputation: 42194

There are 2 things you have to do:

  • select all elements that preceding the text node you are interested in,
  • use nextSibling method to get the text node.

Take a look at this sample code: import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.nodes.TextNode;

public class JsoupExample {

    public static void main(String[] args) {
        String html = "<span class=\"head\">a</span>\n" +
                "xxxx data xxxx\n" +
                "<span class=\"head\">b</span>\n" +
                "xxxx data xxxx\n" +
                "<span class=\"head\">c</span>\n" +
                "xxxx data xxxx";

        Document document = Jsoup.parse(html);

        for (Element span : document.select("span.head")) {
            TextNode node = (TextNode) span.nextSibling();

            assert "xxxx data xxxx".equals(node.text());

            System.out.println(node.text());
        }
    }
}

It uses your input and shows both steps.

Here document.select("span.head") we select all elements with class head, then we iterate over those elements using forEach(span -> {}) function and lambda expression (this is Java 8 example). Then we get interesting text node using: TextNode node = (TextNode) span.nextSibling(); Here we just check if text node equals the value we expect by using assertion and we simply display it to standard output.

Modify this code sample for your needs. I hope it helps you.

Upvotes: 3

Related Questions