user390480
user390480

Reputation: 1665

RegEx selecting more than I want (PHP)

I have the following string:

blah blah yo<desc>some text with description - unwanted 
text</desc>um hey now some words yah<desc>some other description text 
stuff - more unwanted here</desc>random word and ; things. Now a hyphen 
outside of desc tag - with other text<desc>yet another description - unwanted
<desc>and that's about it.

(Note: In reality there are no newline/carriage returns in the string. I only added them here for readability.)

I want to select only the text in the desc tag from the hyphen forward, and also including the preceding space, and also including the ending desc tag. That was simple as I just did this:

\s-.*?<\/desc>

Now, the problem is that the hyphen that is outside the desc tag is getting selected too. So all my selections are as follow:

- unwanted text</desc>
- more unwanted here</desc>
- with other text<desc>yet another description - unwanted</desc>

So the first two are perfect but see how that last line is messed up because of the - outside the desc tag?

Just FYI, if interested, in my code I am doing a replace like this:

$text = preg_replace('/\s-.*?<\/desc>/', '</desc>', $text);

I tried doing some Lookbehind stuff but could not get it to work.

Any ideas?

Thanks! Mark

Upvotes: 2

Views: 97

Answers (3)

mario
mario

Reputation: 145512

You could try [^-<>]* instead of .*?. This restricts what the regex can select and effectively treats angle brackets and the hyphen as tokens.

Upvotes: 1

cdhowie
cdhowie

Reputation: 169320

If desc is the only tag that can appear in this block, you could use a horrible hack like this:

$text = preg_replace('/\s-[^<]*?<\/desc>/', '</desc>', $text);

But if this needs to be bulletproof, you can't reliably do this with a regular expression. You might try using an XML parser and processing the resultant DOM.

Upvotes: 1

whiskeysierra
whiskeysierra

Reputation: 5120

What about:

\s-[^-]*?<\/desc>

Upvotes: 1

Related Questions