alvas
alvas

Reputation: 122330

How to get element by tags using JSoup? - java

How to get element by tags using JSoup (http://jsoup.org/)?

I have the following input and require the following output but i am not getting the text inside the <source>...<\source> tags:

[in:]

<html>
  <something>
    <source>foo bar bar</source>
  <something>
  <source>foo foo bar</source>
</html>

[desired out:]

foo bar bar
foo foo bar

I have tried this:

import java.io.*;
import java.util.List;

import org.apache.commons.io.IOUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class HelloJsoup {
    public static void main(String[] args) throws IOException {

        String br = "<html><source>foo bar bar</source></html>";
        Document doc = Jsoup.parse(br);
        //System.out.println(doc);
        for (Element sentence : doc.getElementsByTag("source"))
            System.out.print(sentence);

    }
}

but it outputs:

<source></source>

Upvotes: 3

Views: 8383

Answers (1)

ashatte
ashatte

Reputation: 5538

You need to use the xmlParser(), which you can pass in to the parse() method:

String br = "<html><source>foo bar bar</source></html>";
Document doc = Jsoup.parse(br, "", Parser.xmlParser());

for (Element sentence : doc.getElementsByTag("source"))
    System.out.println(sentence.text());

}

More on this in the docs: http://jsoup.org/apidocs/org/jsoup/parser/Parser.html#xmlParser()

Upvotes: 5

Related Questions