Reputation: 32755
This is my string:
<br/><span style=\'background:yellow\'>Some data</span>,<span style=\'background:yellow\'>More data</span><br/>(more data)<br/>';
I want to produce this output:
Some data,More data
Right now, I do this in PHP to filter out the data:
$rePlaats = "#<br/>([^<]*)<br/>[^<]*<br/>';#";
$aPlaats = array();
preg_match($rePlaats, $lnURL, $aPlaats); // $lnURL is the source string
$evnPlaats = $aPlaats[1];
This would work if it weren't for these <span>
tags, as shown here:
<br/>Some data,More data<br/>(more data)<br/>';
I will have to rewrite the regex to tolerate HTML tags (except for <br/>
) and strip out the <span>
tags with the strip_tags()
function. How can I do a "does not contain" operation in regex?
Upvotes: 0
Views: 471
Reputation: 342333
don't fret yourself with too much regex. use your normal PHP string functions
$str = "<br/><span style=\'background:yellow\'>Some data</span>,<span style=\'background:yellow\'>More data</span><br/>(more data)<br/>';";
$s = explode("</span>",$str);
for($i=0;$i<count($s)-1;$i++){
print preg_replace("/.*>/","",$s[$i]) ."\n"; #minimal regex
}
explode on "</span>"
, since the data you want to get is all near "</span>"
. Then go through every element of array , replace from start till ">". This will get your data. The last element is excluded.
output
$ php test.php
Some data
More data
Upvotes: 1
Reputation: 4410
Don't listen to these DOM purists. Parsing HTML with DOM you'll have an incomprehensible tree. It's perfectly ok to parse HTML with regex, if you know what you are after.
Step 1) Replace <br */?>
with {break}
Step 2) Replace <[^>]*>
with empty string
Step 3) Replace {break} with <br>
Upvotes: 2
Reputation: 12857
If you really want to use regular expressions for this, then you're better off using regex replaces. This regex SHOULD match tags, I just whipped it up off the top of my head so it might not be perfect:
<[a-zA-Z0-9]{0,20}(\s+[a-zA-Z0-9]{0,20}=(("[^"]?")|('[^']?'))){0,20}\s*[/]{0,1}>
Once all the tags are gone the rest of the string manipulation should be pretty easy
Upvotes: 0
Reputation: 2581
As has been said many times don't use regex to parse html. Use the DOM instead.
Upvotes: -1