M3AT58
M3AT58

Reputation: 13

How to match character inside inline style?

I want to match one letter or number or symbol inside inline style.

Example:

<html>
    <head>
    </head>
    <body>
        <p style="color: #48ad64;font-weight:10px;">hi there</p>
        <div style="background-color: #48ad64;">
            <h3>perfect</h3>
        </div>
    </body>
</html>

I want to match any c or o or # or 4 or ; or -

If we take o for example, it's supposed to match 5 occurrences.

I want to replace every occurrence within a style declaration using preg_replace().

How can I get this? I tried so many different expressions, but none of them did what I want.

Some of what I tried:

  1. /(?:\G(?!^)|\bstyle=")(?:.{0,}?)(o)(?=[^>]*>)/

  2. /(style=")(?:\w+)(o)(([^"]*)")/

I just need the regex to match all o in my HTML. I expect this:

<html> 
   <head> 
   </head> 
   <body>
      <p style="c'o'lor: #48ad64;f'o'nt-weight:10px;">how blabla</p> 
      <div style="backgr'o'und-c'o'l'o'r: #48ad64;">
          <h3>perfect normal o moral bla bal</h3> 
      </div> 
   </body> 
</html>

I just want all o occurrences inside inline-style above to be replaced with 'o'

Upvotes: 1

Views: 232

Answers (1)

mickmackusa
mickmackusa

Reputation: 48011

A quick/dirty/simple solution is to use preg_replace_callback() with str_replace().

Pattern: (Demo with Pattern Explanation) /<[^<]+ style="\K.*?(?=">)/

Code: (Demo)

$html='<html>
    <head>
    </head>
    <body>
        <p style="color: #48ad64;font-weight:10px;">hi there</p>
        <div style="background-color: #48ad64;">
            <h3>perfect</h3>
        </div>
    </body>
</html>';

$needle="o";
echo preg_replace_callback('/<[^<]+ style="\K.*?(?=">)/',function($m)use($needle){return str_replace($needle,"<b>$needle</b>",$m[0]);},$html);
//   add the i flag for case-insensitive matching------^                                     ^-- and add i here for case-insensitive replacing

Output:

<html>
    <head>
    </head>
    <body>
        <p style="c<b>o</b>l<b>o</b>r: #48ad64;f<b>o</b>nt-weight:10px;">hi there</p>
        <div style="backgr<b>o</b>und-c<b>o</b>l<b>o</b>r: #48ad64;">
            <h3>perfect</h3>
        </div>
    </body>
</html>

This is a pure regex replacement method/pattern:

$needle="o";
//                                               vv-----------vv--make the needle value literal
echo preg_replace('/(?:\G(?!^)|\bstyle=")[^"]*?\K\Q'.$needle.'\E/',"'$needle'",$html);
//        assumes no escaped " in style--^^^^  ^^-restart fullstring match

The [^"]*? component eliminates the need for a lookahead. However, if a font family name (or similar) were to use \" (escaped double quotes) then replacement accuracy would be negatively impacted.

I wouldn't call either of these methods "robust" because certain substrings of text may trick the pattern into "over-matching" illegitimate style substrings.

To do this properly, I suggest that you use DomDocument or some other html parser to ensure you are only modifying real/true style attributes.

DomDocument Code: (Demo)

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); // 2nd params to remove DOCTYPE 
$xp = new DOMXpath($dom);
foreach ($xp->query('//*[@style]') as $node) {
    $node->setAttribute('style',str_replace($needle,"'$needle'",$node->getAttribute('style'))); // no regex
}
echo $dom->saveHTML();

Upvotes: 2

Related Questions