Reputation: 4877
If I have an element that looks like so
<li> this is before <span class="between"> this is between </span> this is after </li>
How can I obtain the array {"this is before", "this is after"}
using JSoup?
Note: the text could contain several span
s, but only one of class between
. So for example,
<li>
this
<span class="other"> is </span>
before
<span class="between"> this is between </span>
this is
<span class="other"> after </span>
</li>
should also produce {"this is before", "this is after"}
.
Upvotes: 1
Views: 774
Reputation: 25370
You can iterate over li
's childnodes:
final String html = "<li> \n"
+ "this \n"
+ "<span class=\"other\"> is </span> \n"
+ "before \n"
+ "<span class=\"between\"> this is between </span> \n"
+ "this is \n"
+ "<span class=\"other\"> after </span> \n"
+ "</li>";
Document doc = Jsoup.parse(html);
Element li = doc.select("li").first();
List<String> text = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for( Node node : li.childNodes() ) // Iterate over childnodes
{
if( node instanceof TextNode ) // Plain text
{
sb.append(node.toString());
}
else if( node instanceof Element ) // Element
{
final Element element = (Element) node;
if( element.tagName().equals("span") // Span with 'between' class
&& element.attr("class").equals("between") == true )
{
text.add(sb.toString().trim());
sb = new StringBuilder();
}
else // Every other element
{
sb.append(element.ownText());
}
}
}
text.add(sb.toString().trim());
System.out.println(text);
Output:
[this is before, this is after]
Upvotes: 1