J Seabolt
J Seabolt

Reputation: 2998

Remove a style from an html string

I have an HTML string. It may be any number of elements. I want to remove ANY occurrence of inline styles containing font-size.

For example:

`<p><span style="font-size: 24px;">ORDER</span></p>`

I need that font-size out. I can't quite figure out how to do this with a javascript regular expression. Can I get some help?

Upvotes: 0

Views: 4504

Answers (2)

user557597
user557597

Reputation:

Here you go this regex does it.

Find

 (<[\w:]+)(?=((?:[^>"']|"[^"]*"|'[^']*')*?\s)\s*style\s*=\s*(?:(['"])\s*font-size:(?:(?!\3)[\S\s])*\3)\s*((?:[^>"']|"[^"]*"|'[^']*')*?>))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

Replace

$1$2$4

https://regex101.com/r/4LC6R0/1

Regex with comments

 ( < [\w:]+ )           # (1), Any tag

 (?=                    # Assert (a pseudo atomic group)
      (                      # (2 start), Before style
           (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
           \s 
      )                      # (2 end)
      \s* style \s* = \s*    # Style attribute
      (?:
           ( ['"] )               # (3), Quote
           \s* font-size:         # Containing   font-size:
           (?:
                (?! \3 )
                [\S\s] 
           )*
           \3 
      )
      \s* 
      (                      # (4 start), After style
           (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
           >
      )                      # (4 end)
 )

 # Have everything just consume the rest of the tag
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
 >

Upvotes: 0

Emma
Emma

Reputation: 27743

Edit:

As revo mentioned:

You're using JS. A language that leverages DOM.

So, why not utilzing it?

ANY occurrence of inline styles containing font-size should be removed

var myString = `
<p>
  <span style="font-size: 24px;">ORDER</span>
  <span style="color:blue">
    <b style="line-index:5px; font-size: 12px; margin: 5px">something</b>
  </span>
</p>
`;
var divElement = document.createElement('div');
divElement.innerHTML = myString;

// loop through ALL DOM elements insidie the divElement
var elements = divElement.getElementsByTagName("*");
for (var i = 0; i < elements.length; i++) {
  // remove the style attribute enterily if it contains font-size property
  if ((elements[i].getAttribute('style') || '').includes('font-size')) {
    elements[i].removeAttribute('style');
  }
}

// here is your font-size free string 
console.log(divElement.innerHTML)


If we wish to just get the font-size number, then we can start with maybe this expression:

(?:font-size:\s+)([0-9]+)(?:.+?")

Here, we are adding (?:font-size:\s+) in a non-capturing group as the left boundary, then we collect our desired digits ([0-9]+), and swipe up to the first " using another non-capturing group (?:.+?").

We can simply modify/change these three capturing and non-capturing groups, if we like to have other outputs.

DEMO

const regex = /(?:font-size:\s+)([0-9]+)(?:.+?")/gm;
const str = `"<div style="color: red;"><p style="font-size: 12px">Stuff</p></div>"`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

enter image description here


If we want to remove the style tag and everything in it, this expression might work:

(style=".+?")

DEMO

const regex = /(style=".+?")/gm;
const str = `"<div style="color: red;"><p style="font-size: 12px">Stuff</p></div>""<div style="color: red;"><p style="font-size: 12px">Stuff</p></div>""<div style="color: red;"><p style="font-size: 12px">Stuff</p></div>""<div style="color: red;"><p style="font-size: 12px">Stuff</p></div>"`;
const subst = ``;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Upvotes: 6

Related Questions