johnny243
johnny243

Reputation: 314

Extract text outside of a HTML Tag

I have the following HTML code:

<div class=example>Text #1</div> "Another Text 1"
<div class=example>Text #2</div> "Another Text 2"

I want to extract the Text outside the tag, "Another Text 1" and "Another Text 2"

I'm using JSoup to achieve this.

Any ideas???

Thanks!

Upvotes: 3

Views: 1883

Answers (2)

ollo
ollo

Reputation: 25380

You can select the next Node (not Element!) of each div-tag. In your example they are all TextNode's.

final String html = "<div class=example>Text #1</div> \"Another Text 1\"\n"
                  + "<div class=example>Text #2</div> \"Another Text 2\" ";

Document doc = Jsoup.parse(html);

for( Element element : doc.select("div.example") ) // Select all the div tags
{
    TextNode next = (TextNode) element.nextSibling(); // Get the next node of each div as a TextNode

    System.out.println(next.text()); // Print the text of the TextNode
}

Output:

 "Another Text 1" 
 "Another Text 2" 

Upvotes: 2

ashatte
ashatte

Reputation: 5538

One solution is to use the ownText() method (see the Jsoup docs). This method returns the text owned by the specified element only, and ignores any text owned by its direct children elements.

Using only the html that you provided, you could extract the <body> owntext:

String html = "<div class='example'>Text #1</div> 'Another Text 1'<div class='example'>Text #2</div> 'Another Text 2'";

Document doc = Jsoup.parse(html);
System.out.println(doc.body().ownText());

Will output:

'Another Text 1' 'Another Text 2'

Note that the ownText() method can be used on any Element. There's another example in the docs.

Upvotes: 4

Related Questions