Reputation: 551
I want to parse all the link tags from html file. So for that I have written following regular expression as below.
var pattern = @"<(LINK).*?HREF=(""|')?(?<URL>.*?)(""|')?.*?>";
var regExOptions = RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline;
var linkRegEx = new Regex(pattern , regExOptions );
foreach (Match match in linkRegEx.Matches(htmlFile))
{
var group = match.Groups["URL"];
var url = group.Value;
}
But what happens is that I found matches from html file but I am getting blank capturing group.
Upvotes: 0
Views: 93
Reputation: 149020
You could try a pattern like this:
var pattern = @"<(LINK).*?HREF=(?:([""'])(?<URL>.*?)\2|(?<URL>[^\s>]*)).*?>";
This will match:
<
LINK
, captured in group 1"
or '
, captured in group 2URL
.\2
is a back-reference)>
, greedily, captured in group URL
.>
This will correctly handle inputs like:
<LINK HREF="Foo">
produces url = "Foo"
<LINK HREF='Bar'>
produces url = "Bar"
<LINK HREF=Baz>
produces url = "Baz"
Upvotes: 1