Ferb
Ferb

Reputation: 114

How to select text in HTML tag without a tag around it (JSoup)

I would like to select the text inside the strong-tag but without the div under it...

Is there a possibility to do this with jsoup directly?

My try for the selection (doesn't work, selects the full content inside the strong-tag):

Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");

HTML:

<strong>
   I want that text
   <div class="dontwantthatclass">
   </div>
</strong>

Upvotes: 6

Views: 1909

Answers (3)

RanchiRhino
RanchiRhino

Reputation: 774

Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html. You can use remove(), removeChild() etc. One thing you can do is use regex. Here is a sample regex that matches start and end tag also appended by </br> tag https://www.debuggex.com/r/1gmcSdz9s3MSimVQ

So you can do it like

selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");

You can further modify this regex to match most of your cases.

Another thing you can do is, further process your variable using javascript or vbscript:-

Elements selection = htmlDocument.select("strong")

jquery code here:-

var removeHTML = function(text, selector) {
    var wrapped = $("<div>" + text + "</div>");
    wrapped.find(selector).remove();
    return wrapped.html();
}

With regular expression you can use ownText() methods of jsoup to get and remove unwanted string.

Upvotes: 1

junijo
junijo

Reputation: 116

You are looking for the ownText() method.

String txt = htmlDocument.select("strong").first().ownText();

Upvotes: 10

Micka&#235;l R.
Micka&#235;l R.

Reputation: 344

I guess you're using jQuery, so you could use "innerText" property on your "strong" element:

var selection = htmlDocument.select("strong")[0].innerText;

https://jsfiddle.net/scratch_cf/8ds4uwLL/

PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');

Upvotes: 0

Related Questions