Reputation: 1665
I have the following string:
blah blah yo<desc>some text with description - unwanted
text</desc>um hey now some words yah<desc>some other description text
stuff - more unwanted here</desc>random word and ; things. Now a hyphen
outside of desc tag - with other text<desc>yet another description - unwanted
<desc>and that's about it.
(Note: In reality there are no newline/carriage returns in the string. I only added them here for readability.)
I want to select only the text in the desc tag from the hyphen forward, and also including the preceding space, and also including the ending desc tag. That was simple as I just did this:
\s-.*?<\/desc>
Now, the problem is that the hyphen that is outside the desc tag is getting selected too. So all my selections are as follow:
- unwanted text</desc>
- more unwanted here</desc>
- with other text<desc>yet another description - unwanted</desc>
So the first two are perfect but see how that last line is messed up because of the - outside the desc tag?
Just FYI, if interested, in my code I am doing a replace like this:
$text = preg_replace('/\s-.*?<\/desc>/', '</desc>', $text);
I tried doing some Lookbehind stuff but could not get it to work.
Any ideas?
Thanks! Mark
Upvotes: 2
Views: 97
Reputation: 145512
You could try [^-<>]*
instead of .*?
. This restricts what the regex can select and effectively treats angle brackets and the hyphen as tokens.
Upvotes: 1
Reputation: 169320
If desc is the only tag that can appear in this block, you could use a horrible hack like this:
$text = preg_replace('/\s-[^<]*?<\/desc>/', '</desc>', $text);
But if this needs to be bulletproof, you can't reliably do this with a regular expression. You might try using an XML parser and processing the resultant DOM.
Upvotes: 1