Reputation: 32755

How can I do a "does not contain" operation in regex?

This is my string:

<br/><span style=\'background:yellow\'>Some data</span>,<span style=\'background:yellow\'>More data</span><br/>(more data)<br/>';

I want to produce this output:

Some data,More data

Right now, I do this in PHP to filter out the data:

$rePlaats = "#<br/>([^<]*)<br/>[^<]*<br/>';#";
$aPlaats = array();
preg_match($rePlaats, $lnURL, $aPlaats);    // $lnURL is the source string
$evnPlaats = $aPlaats[1];

This would work if it weren't for these  tags, as shown here:

<br/>Some data,More data<br/>(more data)<br/>';

I will have to rewrite the regex to tolerate HTML tags (except for  ) and strip out the  tags with the strip_tags() function. How can I do a "does not contain" operation in regex?

Upvotes: 0

Answers (4)

ghostdog74

Reputation: 342333

don't fret yourself with too much regex. use your normal PHP string functions

$str = "<br/><span style=\'background:yellow\'>Some data</span>,<span style=\'background:yellow\'>More data</span><br/>(more data)<br/>';";
$s = explode("</span>",$str);
for($i=0;$i<count($s)-1;$i++){
    print preg_replace("/.*>/","",$s[$i]) ."\n"; #minimal regex
}

explode on "" , since the data you want to get is all near "". Then go through every element of array , replace from start till ">". This will get your data. The last element is excluded.

output

$ php test.php
Some data
More data

Upvotes: 1

yu_sha

Reputation: 4410

Don't listen to these DOM purists. Parsing HTML with DOM you'll have an incomprehensible tree. It's perfectly ok to parse HTML with regex, if you know what you are after.

Step 1) Replace   with {break}

Step 2) Replace <[^>]*> with empty string

Step 3) Replace {break} with  

Upvotes: 2

LorenVS

Reputation: 12857

If you really want to use regular expressions for this, then you're better off using regex replaces. This regex SHOULD match tags, I just whipped it up off the top of my head so it might not be perfect:

<[a-zA-Z0-9]{0,20}(\s+[a-zA-Z0-9]{0,20}=(("[^"]?")|('[^']?'))){0,20}\s*[/]{0,1}>

Once all the tags are gone the rest of the string manipulation should be pretty easy

Upvotes: 0

RMcLeod

Reputation: 2581

As has been said many times don't use regex to parse html. Use the DOM instead.

Upvotes: -1

How can I do a &quot;does not contain&quot; operation in regex?

Answers (4)

Related Questions

How can I do a "does not contain" operation in regex?