Elitmiar
Elitmiar

Reputation: 36839

How to replace text over multiple lines using preg_replace

Hi have the following content within an html page that stretches multiple lines

<div class="c-fc c-bc" id="content">
                <span class="content-heading c-hc">Heading 1 </span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br /> 
                The Home Page must provide a introduction to the services provided.<br />
            </div>

I need to replace everthing between <div class="c-fc c-bc" id="content"> and </div> with custom text

I use the following code to accomplish this but it does not want to work if it's multiple lines, but works if evertinh is in one line

$body = file_get_contents('../../templates/'.$val['url']);

$body = preg_replace('/<div class=\"c\-fc c\-bc\" id=\"content\">(.*)<\/div>/','<div class="c-fc c-bc" id="content">abc</div>',$body);

Am I missing something?

Upvotes: 20

Views: 37667

Answers (4)

meistermuh
meistermuh

Reputation: 522

you can also use [\s\S] instead of . combined with the DOTALL flag s for matching everyting because [\s\S] means exactly the same: match everything; \s matches all space-characters (including newline) and \S machtes everything that is not a space-character (i.e. everything else). in some cases/implementations of regular expressions, this works better than enabling DOTALL

caution: .* with the flag for DOTALL as well as [\s\S] are both "hungry" and won't stop reading the string. if you want them to stop at a certain position, (e.g. the first </div>), use the non-greedy operator ? behind your quantifier, e.g. .*?

Upvotes: 1

Al.
Al.

Reputation: 301

It is possible to use regex to strip out chunks of html data, but you need to wrap the html with custom html tags which get ignored by browsers. For example:

<?php
$html='
<div>This will be shown</div>
<custom650 rel="nofollow">
  <p class="subformedit">
    <a href="#" class="mylink">Link</a>
    <div class="morestuff">
      ... more html in here ...
    </div>
  </p>
</custom650>
<div>This will also be shown</div>
';

To strip the tags with the rel="nofollow" attributes, you can use the following regex:

$newhtml = preg_replace('/<([^\s]+)[^>]*rel="nofollow"[^>]*>.*?<\/\1>/si', '', $html);

From experience, start the custom tags on a new line. Undoubtedly a hack, but might help someone.

Upvotes: 0

Will Earp
Will Earp

Reputation: 312

it is the "s" flag, it enables . to capture newlines

Upvotes: 24

Mark Byers
Mark Byers

Reputation: 838156

If this weren't HTML, I'd tell you to use the DOTALL modifier to change the meaning of . from 'match everything except new line' to 'match everything':

preg_replace('/(.*)<\/div>/s','abc',$body);

But this is HTML, so use an HTML parser instead.

Upvotes: 42

Related Questions