Primoz Rome
Primoz Rome

Reputation: 11061

replace string pattern in HTML text with PHP

For my customer I wrote a custom web-based WYSIWYG HTML editor. It allows them to format basic HTML text and insert images. When they insert images I insert them with pattern like ##image1##. The produced HTML can be something like this:

<p>some text and some more text</p>
<p>some text and some <b>bold text</b></p>
<div>##image1##</div>
<p>more text can follow here</p>
<div>##image2##</div>

When outing this HTML I am searching trough it and replacing occurrences for images and replacing ##image1##, ##image2## and so on with HTML markup that actually display images. My replace code is here:

// first find all occurrences of image string
preg_match_all('|##(.+)##|', $inputHTML, $matches);

for every match in $inputHTML
    $output = preg_replace('|##(.+)##|', $imageHTML, $inputHTML, 1 );

This will work mot of the times, but in some variations of input HTML will parse strange result. One of the HTML that produces strange result is:

<div>##image1##</div><p class="align-justify"><strong>Peter Dekleva</strong>, <strong>Damir Lisica</strong>, <strong>Anej Kočevar</strong> in <strong>Gregor Jakac</strong> so glasbeniki, ki v svoji glasbi združujejo silovite  instrumentalne vložke, markantne melodije in močna besedila.</p><div>##image2##</div><p class="align-justify">Video dvojček skladbe Brez strahu torej prikazuje oblico sproščenih trenutkov iz zaodrja, veličasnih posnetkov s koncertnega dogajanja, priprav na nastope, nepredvidljive zaključke noči.</p>

If I edit that HTML and add a line brake before <div>##image2##</div> then it will parse it OK. Any idea what is happening here and why I have problems?

I am also opened to suggestions for a better way of doing this. I can insert something else instead ##image1## when inserting image in my WYSIWYG editor... Thanks

Upvotes: 1

Views: 858

Answers (2)

Aprillion
Aprillion

Reputation: 22340

you should create <img/> directly - but anyway, if you don't use # for your image names, use ^# instead of .

also if you are not sure that ## won't be used in other HTML, test for <div> too

<div>##(^#+)##</div>

Upvotes: 0

Jacob Eggers
Jacob Eggers

Reputation: 9332

This is because the + modifier is greedy. So it will match everything until the last instance of ##. Try adding a ? after the + to change it to ungreedy.

|##(.+?)##|

The reason that a line break fixes the problem is because by default the . doesn't match line breaks. however if you had done instead: |##(.+)##|s the line break wouldn't have fixed the problem.

Edit I just noticed that churk's answer to your previous question would have also worked correctly.

Upvotes: 1

Related Questions