Reputation: 18204
I currently have the following content:
<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1">
<li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
<li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu."</li>
</ul>
</section>
Sandbox URL: http://regex101.com/r/zQ0lN5
I have the following code in PHP:
$new_content = preg_replace('/(?<=<ul class="sample1">|<\/li>)\s*?(?=<\/ul>|<li.*?>)/is', '', $content);
This works, the whitespaces between ul and li and between the li-items are removed so the expected output is.
<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<!-- SEE BELOW NO WHITE SPACES -->
<ul class="sample1"><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu."</li></ul>
</section>
I rather like to do the following:
//Ignore what's between < and > : <ul.*?>
$new_content = preg_replace('/(?<=<ul.*?>|<\/li>)\s*?(?=<\/ul>|<li.*?>)/is', '', $content);
So a coder can even add style or whatever in the ul tag and the code still won't break. However lookbehinds need to be zero-width, thus quantifiers are not allowed. So how do I fix this?
Upvotes: 2
Views: 205
Reputation: 8509
Maybe this can do the trick? You don't need lookbehinds.
echo preg_replace("/[\s\n]*?(\<(\/ul>|li[\s>]))/i", "$1", $your_document);
Where $your_document
is HTML code you want to deal with.
So, if this is your HTML:
<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1">
<li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
<li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li>
</ul>
</section>
Output for that looks like:
<section>
<hgroup>
<h1 style="text-align: center;">Koptitel 1</h1>
<h2 style="text-align: center;">Subtitel</h2>
</hgroup>
<ul class="sample1"><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li><li class="sample2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum placerat, urna eget ultricies egestas, lectus mi tincidunt nulla, ut molestie odio lectus ut arcu.</li></ul>
</section>
This removes all whitespaces and new-line (\n
) characters between <ul> and <li>
, between </li> and <li>
, and between </li> and </ul>
tags making entire <ul>
element written in one line with no spaces between >
and <
inside. This regular expression is not case-sensitive so it also looks for <LI>
as well as <li>
.
Upvotes: 2