mbhargav294
mbhargav294

Reputation: 386

Regular expression for anchor tags

When I try to write a regular expression to match anchor tags that takes the link in href as group 2 and text of anchor tag as group 3 as following:

<a( href=\"(\S+)\")?.*>([a-zA-Z0-9 ]+)<\/a>

to match this text:

hello there <a href="Hello/world1">Hello World1</a><b>How are You<b><a href="Hello/world2">Hello World2</a>

But instead of match Hello World1 for group 3 it matches Hello World2. Can someone please help me write a regular expression to match group2 = Hello/world1 and group2 = Hello World2. Thanks.

Click to see...

Upvotes: 1

Views: 1238

Answers (1)

JosephRuby
JosephRuby

Reputation: 475

the proper syntax for the example you have given would look something like:

(?:<a(?: href=[^>]+>([^<]+)<\/a>(?!<a)?))+

but using regex to parse html is highly unrecommended, as a language parser would be much more efficient and capeable of handling all possible situations that could occur in html.

Upvotes: 1

Related Questions