Reputation: 139
I need to match only the first occurrence of html link with 'data-{someData}' attributes. I've written regex like below:
\<a\s+(.+)\s+data-\s*(.+)\s*>(.+)<\/a>
and it works for a pice of html with only one html link like:
SOME TEXT/HTML
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
SOME TEXT/HTML
but the problem is when html contains more links. Then the regex match till the last one occurrence of </a>
. So from the below html:
SOME TEXT/HTML
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
SOME TEXT/HTML
<a href="~/link.aspx?_id=1256272320C4429DAB8A1F40D429C841&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{12562723-20C4-429D-AB8A-1F40D429C841}"
data-dms-event="Content button">Link2
</a>
SOME TEXT/HTML
I need to fix my regex to match only:
<a href="~/link.aspx?_id=B0B5056BD5984878BEB5C92AF6B74DB3&_z=z"
data-dms="{6782B150-F6FA-49E6-A2FF-6D6014470373}"
data-targetid="{B0B5056B-D598-4878-BEB5-C92AF6B74DB3}"
data-dms-event="Content button">Link1
</a>
Upvotes: 0
Views: 59
Reputation: 3437
First off you, have you looked for options other than regexp? Regexp is not the ideal tool to parse html. If your language have a DOM you should be able to extract the needed tag from this.
That said, if you need to use regexp, there are two ways to get around the problem you are facing.
The first, and in general the preferable, solution is to be more restrictive in what you match. Rather than matching any character with .
match any legal characters with character classes such as [^>]
.
The second is to use eager matching rather than greedy matching. This is done by adding ?
after your quantifiers. Ie replace +
with +?
and *
with *?
. By using eager matching the regexp will return on the first match found, rather than on the last.
Upvotes: 2