Kyle
Kyle

Reputation: 1173

Strip certain HTML from string

I am using ngx-quill and the input body returns some HTML elements.

Example

<p><strong><em><u>"Soft fingers began to tap the sill of the car window, and the hard fingers tightened on the restless drawing sticks. In the doorways of the sun-beaten tenant houses, women sighed and then shifted feet so that the one that had been down was now on top, and the toes working. Dogs came sniffing near the owner cars and wetted on all four tires one after another. And chickens lay in the sunny dust and fluffed their feathers </u></em></strong></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><strong><em>to get the cleansing dust

I want to remove all of the HTML tags, except the newline paragraphs.

When a post has multiple lines / breaks, ngx-quill adds several chained <p></p><p></p> (see above)

I've tried to use the replace function to strip the elements, but certain elements like <u> are not being removed. Also how can I consolidate the sections that have several line breaks into just one line break

I have tried

post = '<p><strong><em><u>"Soft fingers began to tap the sill of the car window, and the hard fingers tightened on the restless drawing sticks. In the doorways of the sun-beaten tenant houses, women sighed and then shifted feet so that the one that had been down was now on top, and the toes working. Dogs came sniffing near the owner cars and wetted on all four tires one after another. And chickens lay in the sunny dust and fluffed their feathers </u></em></strong></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><strong><em>to get the cleansing dust down to the skin. In the little sties the pigs grunted inquiringly over the muddy remnants of the slops.""Soft fingers began to tap the sill of the car window, and the hard fingers tightened on the restless drawing sticks. In the doorways of the sun-beaten tenant houses, women sighed and then shifted feet so that the one that had been down was now on top, and the toes working. Dogs came sniffing near the owner cars and wetted on all four tires one after another. And chickens lay in the sunny dust and fluffed their feathers to get the cleansing dust down to the skin. In the little sties the pigs grunted inquiringly over the muddy remnants of the slops."</em></strong></p>'

function stripElements(post: any) {
    let newPost = post;
    newPost = newPost.replace('<u>', '<span>');
    newPost = newPost.replace('</u>', '</span>');
    newPost = post.replace('<strong>','');
    newPost = newPost.replace('</strong>', '');
    newPost = newPost.replace('<em>', '');
    newPost = newPost.replace('</em>', '');

    newPost = newPost.replace('<p><br></p>', '<p></p>')
    
    return newPost;
}

Upvotes: 2

Views: 78

Answers (2)

skara9
skara9

Reputation: 4194

You can use the DOMParser API to parse and manipuate the HTML code:

post = '<p><strong><em><u>"Soft fingers began to tap the sill of the car window, and the hard fingers tightened on the restless drawing sticks. In the doorways of the sun-beaten tenant houses, women sighed and then shifted feet so that the one that had been down was now on top, and the toes working. Dogs came sniffing near the owner cars and wetted on all four tires one after another. And chickens lay in the sunny dust and fluffed their feathers </u></em></strong></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><strong><em>to get the cleansing dust down to the skin. In the little sties the pigs grunted inquiringly over the muddy remnants of the slops.""Soft fingers began to tap the sill of the car window, and the hard fingers tightened on the restless drawing sticks. In the doorways of the sun-beaten tenant houses, women sighed and then shifted feet so that the one that had been down was now on top, and the toes working. Dogs came sniffing near the owner cars and wetted on all four tires one after another. And chickens lay in the sunny dust and fluffed their feathers to get the cleansing dust down to the skin. In the little sties the pigs grunted inquiringly over the muddy remnants of the slops."</em></strong></p>'

function stripElements(post) {
  const doc = new DOMParser().parseFromString(post, 'text/html');
  doc.querySelectorAll('body :not(p)').forEach(el => el.replaceWith(el.textContent))
  return doc.body.innerHTML;
}

console.log(stripElements(post))

Upvotes: 3

kshetline
kshetline

Reputation: 13734

Rule #1: Don't manipulate HTML with regexes. Use a DOM parser instead.

Rule #2: You probably don't want to fuss with the overhead of a DOM parser, just want to get the job done, and are likely to ignore Rule #1.

Therefore, if you wish, something like this might do the trick:

return post.replace(/<\/?[a-z]+>/gi, m => m.toLowerCase() === '<br>' ? '<p></p>' : '');

I'm not exactly sure this is how you wanted to handle the line breaks, but given this as a start you should be able to tweak it as you need.

Upvotes: 2

Related Questions