Reputation: 386
When I try to write a regular expression to match anchor tags that takes the link in href as group 2 and text of anchor tag as group 3 as following:
<a( href=\"(\S+)\")?.*>([a-zA-Z0-9 ]+)<\/a>
to match this text:
hello there <a href="Hello/world1">Hello World1</a><b>How are You<b><a href="Hello/world2">Hello World2</a>
But instead of match Hello World1
for group 3 it matches Hello World2
. Can someone please help me write a regular expression to match group2 = Hello/world1
and group2 = Hello World2
.
Thanks.
Upvotes: 1
Views: 1238
Reputation: 475
the proper syntax for the example you have given would look something like:
(?:<a(?: href=[^>]+>([^<]+)<\/a>(?!<a)?))+
but using regex to parse html is highly unrecommended, as a language parser would be much more efficient and capeable of handling all possible situations that could occur in html.
Upvotes: 1