Reputation: 314
I have the following HTML code:
<div class=example>Text #1</div> "Another Text 1"
<div class=example>Text #2</div> "Another Text 2"
I want to extract the Text outside the tag, "Another Text 1" and "Another Text 2"
I'm using JSoup to achieve this.
Any ideas???
Thanks!
Upvotes: 3
Views: 1883
Reputation: 25380
You can select the next Node
(not Element
!) of each div
-tag. In your example they are all TextNode
's.
final String html = "<div class=example>Text #1</div> \"Another Text 1\"\n"
+ "<div class=example>Text #2</div> \"Another Text 2\" ";
Document doc = Jsoup.parse(html);
for( Element element : doc.select("div.example") ) // Select all the div tags
{
TextNode next = (TextNode) element.nextSibling(); // Get the next node of each div as a TextNode
System.out.println(next.text()); // Print the text of the TextNode
}
Output:
"Another Text 1"
"Another Text 2"
Upvotes: 2
Reputation: 5538
One solution is to use the ownText()
method (see the Jsoup docs). This method returns the text owned by the specified element only, and ignores any text owned by its direct children elements.
Using only the html that you provided, you could extract the <body>
owntext:
String html = "<div class='example'>Text #1</div> 'Another Text 1'<div class='example'>Text #2</div> 'Another Text 2'";
Document doc = Jsoup.parse(html);
System.out.println(doc.body().ownText());
Will output:
'Another Text 1' 'Another Text 2'
Note that the ownText()
method can be used on any Element
. There's another example in the docs.
Upvotes: 4