Reputation: 4280
I have some text stored in a database, which looks something like below:
let text = "<p>Some people live so much in the future they they lose touch with reality.</p><p>They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>"
The text can have many paragraphs and HTML tags.
Now, I also have a phrase:
let phrase = 'lose touch'
What I want to do is search for the phrase
in text
, and return the complete sentence containing the phrase
in strong
tag.
In the above example, even though the first para also contains the phrase 'lose touch', it should return the second sentence because it is in the second sentence that the phrase is inside strong
tag. The result will be:
They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.
On the client-side, I could create a DOM tree with this HTML text, convert it into an array and search through each item in the array, but in NodeJS document is not available, so this is basically just plain text with HTML tags. How do I go about finding the right sentence in this blob of text?
Upvotes: 5
Views: 562
Reputation: 3966
I think this might help you.
No need to involve DOM in this if I understood the problem correctly.
This solution would work even if the p or strong tags have attributes in them.
And if you want to search for tags other than p, simply update the regex for it and it should work.
const search_phrase = "lose touch";
const strong_regex = new RegExp(`<\s*strong[^>]*>${search_phrase}<\s*/\s*strong>`, "g");
const paragraph_regex = new RegExp("<\s*p[^>]*>(.*?)<\s*/\s*p>", "g");
const text = "<p>Some people live so much in the future they they lose touch with reality.</p><p>They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>";
const paragraphs = text.match(paragraph_regex);
if (paragraphs && paragraphs.length) {
const paragraphs_with_strong_text = paragraphs.filter(paragraph => {
return strong_regex.test(paragraph);
});
console.log(paragraphs_with_strong_text);
// prints [ '<p>They don\'t just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>' ]
}
P.S. The code is not optimised, you can change it as per the requirement in your application.
Upvotes: 2
Reputation: 11
first you could var arr = text.split("<p>")
in order to be able to work with each sentence individually
then you could loop through your array and search for your phrase inside strong tags
for(var i = 0; i<arr.length;i++){
if(arr[i].search("<strong>"+phrase+"</strong>")!=-1){
console.log("<p>"+arr[i]);
//arr[i] is the the entire sentence containing phrase inside strong tags minus "<p>"
} }
Upvotes: 0
Reputation: 1474
There is cheerio which is something like server-side jQuery. So you can get your page as text, build DOM, and search inside of it.
Upvotes: 0