Luca Romagnoli
Luca Romagnoli

Reputation: 12475

regex selection

I have a string like this.

<p class='link'>try</p>bla bla</p>

I want to get only <p class='link'>try</p> I have tried this.
/<p class='link'>[^<\/p>]+<\/p>/

But it doesn't work.

How can I can do this? Thanks,

Upvotes: 3

Views: 125

Answers (4)

Doug
Doug

Reputation: 560

I tried to make one less specific to any particular tag.

(<[^/]+?\s+[^>]*>[^>]*>)

this returns:

<p class='link'>try</p>

Upvotes: 0

Ray
Ray

Reputation: 4879

It looks like you used this block: [^<\/p>]+ intending to match anything except for </p>. Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /<p class='link'>[^<\/p>]+ part matched <p class='link'>try</, but it was not immediately followed by the expected </p>, so there was no match.

Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.

Upvotes: 0

Allen
Allen

Reputation: 263

'/<p[^>]+>([^<]+)<\/p>/'

will get you "try"

Upvotes: 0

alex
alex

Reputation: 490607

If that is your string, and you want the text between those p tags, then this should work...

/<p\sclass='link'>(.*?)<\/p>/

The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.

Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)

Upvotes: 4

Related Questions