Raj
Raj

Reputation: 3061

How do i match content between particular all <li> tags?

How do I match all the <li> tags in the below HTML code:

<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>

This expression doesn't work:

<li>(.*)</li>

Because it returns:

some content</li>
    <li> some other content</li>
    <li> some other other content.

Which is the content between the first <li> and the last </li>

Upvotes: 3

Views: 7296

Answers (7)

Blender
Blender

Reputation: 298206

Someone please link the Regex HTML Parser question...

There is a reason HTML parsers exist, which is to parse HTML.

This solution is a bit long, but it is versatile and works for elements with classes, ids, etc:

<?php

function innerHTML($node) {
  $doc = new DOMDocument();

  foreach ($node->childNodes as $child) {
    $doc->appendChild($doc->importNode($child, true));
  }

  return $doc->saveHTML();
}

$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";

$document = new DOMDocument();
$document->loadHTML($string);

$ul = $document->getElementsByTagName("ul");

foreach ($ul as $element) {
  print innerHTML($element);
}

?>

It seems like you don't need the tag names. Try this simpler code:

<?php

$string = "<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>";

$document = new DOMDocument();
$document->loadHTML($string);

$ul = $document->getElementsByTagName("li");

foreach ($ul as $element) {
  print $element->nodeValue;
}

?>

Upvotes: 3

manojlds
manojlds

Reputation: 301167

Try to use .*? rather than .* - it is lazy or non-greedy match and matches as little as possible.

Response to @CanSpice:

Of course regex is not suited for HTML. OP should try something like <li>(?!.*<li>).*?</li> depending on what he is doing. OR rather use a parser. I can only direct the OP one step at a time

Upvotes: 1

Rajkamal Subramanian
Rajkamal Subramanian

Reputation: 6944

var a = '<ul>'+
'<li> some content</li>'+
'<li> some other content</li>'+
'<li> some other other content.</li>'+
'</ul>'

a.split("<li>") 
gives
["<ul>", " some content</li>", " some other content</li>", " some other other content.</li></ul>"]

From there we can pick whatever we want.

Upvotes: 0

fvox
fvox

Reputation: 1087

<?php
$str = '<ul>
<li> some content</li>
<li> some other content</li>
<li> some other other content.</li>
</ul>';

preg_match_all('/<li>([^<]+)</li>/i', $str, $r); print_r($r[1]); ?>

Output:

`Array
(
    [0] =>  some content
    [1] =>  some other content
    [2] =>  some other other content.
)
`

Upvotes: 0

anubhava
anubhava

Reputation: 785256

Since you are matching HTML text I would suggest atleast use s and i flags like this:

'~<li>(.*?)</li>~is'
  • s is for DOTALL to make dot . match all the characters including new line
  • i is for ignore case matching

Upvotes: 0

Jason McCreary
Jason McCreary

Reputation: 72991

Regular expressions are greedy by nature. Make it non-greedy by adding the ?.

<li>(.*?)</li>

Note: I'd encourage a DOM Parser for such a thing. Check out PHP's DOMDocument.

Upvotes: 8

grundprinzip
grundprinzip

Reputation: 2491

Try to make the Regexp non-greedy

<li>(.*?)</li>

Upvotes: 0

Related Questions