aditya parikh
aditya parikh

Reputation: 595

Regular expression pattern match

I want to extract from a string containing html content, text between the first occurrence of (<a> and <span> tags).

My pattern is as following :

$pattern='/<a[^(span)][\/\(\)-:@!%*>#=_|?$&";.\w\s]+<\/a> <span/um';

I get the output as text between 1st occurrence of <a and last occurrence of <span and not text between 1st occurrence of both.

eg, html content:

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah

Want:

<a href="#">asdasdasd</a> <span

Getting:

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span

Upvotes: 0

Views: 126

Answers (2)

doublesharp
doublesharp

Reputation: 27609

You need to make the regular expression lazy rather than greedy by telling it to match as few characters between <a and <span as possible with .+?:

$ptn = '/<a.+?<span/';
$str = '<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah';
preg_match($ptn, $str, $matches);
echo $matches[0];

The result is <a href=\"#\">asdasdasd</a> <span

Upvotes: 0

pogo
pogo

Reputation: 1550

  1. Use a HTML parser for parsing HTML
  2. Use lazy quantifier '/<a[^(span)][\/\(\)-:@!%*>#=_|?$&";.\w\s]+?<\/a> <span/um';

Upvotes: 1

Related Questions