Massimo
Massimo

Reputation: 573

search, replace and regroup with regexp and php

I would like to search and replace some tags with regexp.

this is my starting string:

<p>some bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>

and this is the result that I want

<p>some bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li><li>bla bla and bla</li></ul>
<p>other bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li>
<li>other bla bla bla</li></ul>

So I want to substitute all <p>• or <p>&bull; with <li> and </p> with </li> and regroup every group of <li></li><li></li><li></li> in <ul></ul>

For now I have done some test and the code below is the result, but I don't think is the best way, and the regroup part isn't complete.

Searching and Replace

// base string
$test = '<p>some bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>';
// First replace, I don't know but I can't find any • or &bull; with regexp
$text = str_replace(array('•', '&bull;'), '!SUB!', $text);
$regexp = '/(<p( class="normale"){0,}>(!SUB!))(.*?)<\/p>/';
// replace bulled paragraph with li tags
$text = preg_replace($regexp, "<li>$4</li>\n", $text);

But the part that regroup what I have found is very hard, and I don't know how to proceed

Upvotes: 1

Views: 399

Answers (1)

BIG DOG
BIG DOG

Reputation: 71

I concur with @Colin; however, is the above Searching and Replace code doing what you want? i.e. is it finding the • char? If so, I'd recommend not using the !SUB! replacement, but instead just include it as part of your

regex:
/(<p( class="normale")?>(&bull;|•))(.*?)</p>/

If not, then you have to find the corresponding ASCII representation(in hex or octal) and put that in its place inside the regex.<br>

Once you've gotten this far, an XML parser would make quick work of the reordering part of it. :-)

Upvotes: 1

Related Questions