Egi
Egi

Reputation: 513

Get all text inside html tag with regex?

For example, I have the HTML:

<strong>this one</strong> <span>test one</span>
<strong>this two</strong> <span>test two</span>
<strong>this three</strong> <span>test three</span>

How get all text inside strong and span with regex?

Upvotes: 1

Views: 1205

Answers (3)

Use a DOM and never use regular expressions for parsing HTML.

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('strong') as $tag) {
   echo $tag->nodeValue."<br>";
  }
foreach ($dom->getElementsByTagName('span') as $tag) {
    echo $tag->nodeValue."<br>";
}

OUTPUT :

this one
this two
this three
test one
test two
test three

Demo


Why I shoudn't use Regular Expressions to parse HTML Content ?

HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML.

That article was from our Jeff Atwood. Read more here.

Upvotes: 2

Amal Murali
Amal Murali

Reputation: 76646

Use DOMDocument to load the HTML string and then use an XPath expression to get the required values:

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//strong | //span') as $node) {
    echo $node->nodeValue, PHP_EOL;
}

Output:

this one
test one
this two
test two
this three
test three

Demo

Upvotes: 2

sshashank124
sshashank124

Reputation: 32189

You can use captured groups. Here are some examples:

<strong>([^\<]*)<\/strong>

Demo: http://regex101.com/r/sK5uF2

And

<span>([^\<]*)<\/span>

Demo: http://regex101.com/r/vJ2kP3

In each of these, the first captured group is your text: \1 or $1

Upvotes: 0

Related Questions