Reputation:
I have a problem with regular expression in PHP.
This text should be handled:
Start Text1
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
End Text1
Start Text2
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
End Text2
I would like to add <ul> and </ul> to the <li> lines.
I try this, with this patter:
(?!<\/li>)\s*(<li>.*</li>)\s*(?=<li>|)
But gives something like this:
Start Text1
<ul>
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
End Text1
Start Text2
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
</ul>
End Text2
... the "End Text1" and "Start Text2" also included. So I prefer to get this result:
Start Text1
<ul>
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
</ul>
End Text1
Start Text2
<ul>
<li>Item1</li>
<li>Item2</li>
<li>Item3</li>
</ul>
End Text2
How can I do this?
I tested this here: https://www.phpliveregex.com/p/sHs#tab-preg-replace
Upvotes: 2
Views: 72
Reputation: 12229
Fixing the regex
This regular expression works:
(\s*<li>.*?<\/li>\s*)(?!\s*<li>)
Explanation:
.*?
asks the regex to match as little as possible between <li>
and </li>
, so that it stops as soon as there is text not within an <li>
;/
in the second instance of </li>
, as you had already done in the first instance;(?!\s*<li>)
says the next bit of text cannot be another <li>
- needed because otherwise .*?
above makes it match each <li>
line separately;(?!<\/li>)
doesn't actually do anything, so I removed it.Nicer handling of newlines
On the Live Regex web site, I was not able to insert newlines where I wanted to.
In php proper, you can use
preg_replace('/\s*(<li>.*?<\/li>)\s*(?!\s*<li>)/smi',
"\n<ul>\n$1\n</ul>\n", $input)
or
preg_replace('/(\s*<li>.*?<\/li>\s*)(?!\s*<li>)/smi',
"\n<ul>$1</ul>\n", $input)
to get nicer results. The key is to put the replacement pattern in double quotes.
Handling indented input better
If the input was indented, you might also consider something like this:
preg_replace('(\s*)(<li>.*?<\/li>)(\s*)(?!\s*<li>)/smi',
"$1<ul>$1$2$1</ul>$3", $input)
this will put <ul>
and </ul>
at the same indentation level as the first <li>
, and keep the surrounding text at the indentation it had beforehand.
But obviously none of this is really important given all these spacing variants won't change the interpretation of the resulting HTML.
Upvotes: 1