MarkMark
MarkMark

Reputation: 184

How to ignore tag inside RegExp

In my project, we use a RegExp to display the title of cards that we receive from the deck. And recently I found that from the deck side we sometimes receive different formats and the titles didn't display.

So, before it was always a string like this:

const res =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;

and the RegExp was:

/<p class="cardTitle"[^>]*>[\s]*<span[^>]*>(.+)<\/span><span[^>]*>(.+)<\/span><div[^>]*>(.+)<\/div>+[\s]*<\/p>/i.exec(res);

Now sometimes we receive res with div and <br> tags inside

const res = 
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;

The question is, how to change the RegEx to ignore this <div>..<br>.</div>?

Here's a demo:

const res =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;
   
const newRes =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
   
const regEx = /<p class="cardTitle"[^>]*>[\s]*<span[^>]*>(.+)<\/span><span[^>]*>(.+)<\/span>+[\s]*<\/p>/i;
   
const correct = regEx.exec(res);
const broken = regEx.exec(newRes);
 
console.log('correct', correct);
console.log('broken', broken);

Would be really grateful for any help!

Upvotes: 0

Views: 79

Answers (2)

LukStorms
LukStorms

Reputation: 29647

Simplify the regex

/<p class="cardTitle"[^>]*>\s*<span[^>]*>(.*?)<\/span><span[^>]*>(.*?)<\/span>.*?<\/p>/si

This will get the p tag, with the 2 spans and whatever else it contains.

const res =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
   </p>`;
   
const newRes =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
   
const regEx = /<p class="cardTitle"[^>]*>\s*<span[^>]*>(.*?)<\/span><span[^>]*>(.*?)<\/span>.*?<\/p>/si;
   
const correct = regEx.exec(res);
const broken = regEx.exec(newRes);
 
console.log('correct', correct);
console.log('broken', broken);

Upvotes: 1

zer00ne
zer00ne

Reputation: 43870

Parse the htmlString into the DOM, then extract the text.

const res =
  `<p class="cardTitle" style="font-size: 27.2px">
    <span>Some text here</span><span> - Title</span>
    <div style="font-size: 10px">Title:<br>Some text here</div>
   </p>`;
const getNodes = str => {
  document.body.insertAdjacentHTML('beforeEnd', str);
  const DOM = document.querySelector('.cardTitle');
  return DOM.innerText;
};

console.log(getNodes(res));

Upvotes: 0

Related Questions