Roshan
Roshan

Reputation: 2059

regex to identify anchor tag which should not be nested

From the html source I've to identify anchor tag which shouldn't be nested.

For example:

<a href="http://www.abc.com">abc<a href="http://www.dbc.com">dbc</a>

From this on first match it should return

<a href="http://www.abc.com">abc

On subsequent find

<a href="http://www.dbc.com>dbc</a>

While finding it should return from open anchor tag to close anchor tag if it is not nested. If it is nested it should return string from open anchor tag to before the beginning of the nested open anchor tag.

Please help. Thanks in advance

Upvotes: 0

Views: 342

Answers (1)

Brian Agnew
Brian Agnew

Reputation: 272417

I'd suggest using JTidy. Despite its name it's an HTML parser and will handle all the edge cases that trip up regular expressions (not surprisingly given HTML isn't regular).

Upvotes: 3

Related Questions