Kirzilla
Kirzilla

Reputation: 16596

Regular expression, how to find all A tags which do not contain tag IMG inside it?

Let's suppose that we have such HTML code. We need to get all <a href=""></a> tags which DO NOT contain img tag inside it.

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

I'm using this regular expression to find all the a tag links:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>(.*?)</a>!is", $content, $out);

I can modify it like this:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>([^<>]+?)</a>!is", $content, $out);

But how can I tell it to exclude results containing <img substring inside of <a href=""></a>?

Upvotes: 0

Views: 1423

Answers (2)

gbro3n
gbro3n

Reputation: 6967

Dom is the way to go, but for the sake of interest here is the solution:

The easiest way too exclude certain matches in regular expressions is to use a 'negative look-ahead' or a 'negative look-behind'. If the negative expression is found anywhere in the string, the match fails.

Example:

^(?!.+<img.+)<a href=\"?\'?.+\"?\'?>.+</a>$

Matches:

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>

But does not match:

<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

The negative look forward is this part of the string:

(?!.+<img.+)

This says don't match any strings that have any chars followed by <img, followed by any chars.

<a href=\"?\'?.+\"?\'?>.+</a>

The rest is my general match for anchor tags in html, you might want to use an alternate match expression.

You may need to omit the start and end ^ $ chars depending on your useage.

More info on look ahead / behind

http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html

Upvotes: 2

DisgruntledGoat
DisgruntledGoat

Reputation: 72550

You need to use a HTML parser like the Simple DOM parser. You cannot parse HTML with regular expressions.

Upvotes: 3

Related Questions