hiroimono
hiroimono

Reputation: 43

How to remove html tags from an Html string using RegEx?

I have an HTML string containing names:

<div class=\"ExternalClassBE95E28C1751447DB985774141C7FE9C\"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>

And, I would like to remove all html tags and put '&' between names but not at the end of last one like:

Not desired: Tina Schmelz & Sascha Balke &
Desired:     Tina Schmelz & Sascha Balke

I used regex and string replace property.

I could do it by using replace all for <br> tags with ' & ' and then removed all html tags by using this codes:

let mytext = '<div class=\"ExternalClassBE95E28C1751447DB985774141C7FE9C\"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>';
mytext = mytext.replaceAll(/<br>/gi, ' & ');
mytext = mytext.replaceAll(/<.*?>/gi, ''); 

console.log(mytext)

My question: how can I remove last ' & '? Or, does anyone knows better RegEx for it to complete everything in one line? :)

Upvotes: 2

Views: 3791

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627469

You can use

.replace(/<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi, (x,y) => y ? ' & ' : '')

See the JavaScript demo:

const text = '<div class="ExternalClassBE95E28C1751447DB985774141C7FE9C"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>';
const regex = /<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi;
console.log(
  text.replace(regex, (x,y) => y ? ' & ' : '')
);

Details:

  • <br>(?=(?:\s*<[^>]*>)*$) - a <br> that is followed with zero or more occurrences of zero or more whitespaces and then a <...> substring till the end of string
  • |- or
  • (<br>) - Group 1: <br> tag
  • | - or
  • <[^>]*> - <, zero or more chars other than < and > and then a >.

Upvotes: 1

aaandri98
aaandri98

Reputation: 605

You could replace all the tags with the code you already written; then use split function to divide the names and the join one to insert the & in the desired points.

const myTextParsed = []

let myText = '<div class=\"ExternalClassBE95E28C1751447DB985774141C7FE9C\"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>';

let myTextArray = myText.split('<br>'); 

myTextArray = myTextArray.map(ta => {
  const temp = ta.replaceAll(/<.*?>/gi, '')
  if (temp.length > 0) { myTextParsed.push(temp) }  
});

myText = myTextParsed.join(' & ');

console.log(myText)

Upvotes: 1

Related Questions