Reputation: 513
For example, I have the HTML:
<strong>this one</strong> <span>test one</span>
<strong>this two</strong> <span>test two</span>
<strong>this three</strong> <span>test three</span>
How get all text inside strong and span with regex?
Upvotes: 1
Views: 1205
Reputation: 68476
Use a DOM
and never use regular expressions for parsing HTML.
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
echo $tag->nodeValue."<br>";
}
foreach ($dom->getElementsByTagName('span') as $tag) {
echo $tag->nodeValue."<br>";
}
OUTPUT :
this one
this two
this three
test one
test two
test three
HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML.
That article was from our Jeff Atwood. Read more here.
Upvotes: 2
Reputation: 76646
Use DOMDocument
to load the HTML string and then use an XPath expression to get the required values:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//strong | //span') as $node) {
echo $node->nodeValue, PHP_EOL;
}
Output:
this one
test one
this two
test two
this three
test three
Upvotes: 2
Reputation: 32189
You can use captured groups. Here are some examples:
<strong>([^\<]*)<\/strong>
Demo: http://regex101.com/r/sK5uF2
And
<span>([^\<]*)<\/span>
Demo: http://regex101.com/r/vJ2kP3
In each of these, the first captured group is your text: \1
or $1
Upvotes: 0