Reputation: 2206
I have the following code to attempt to extract the content of li tags.
string blah = @"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
Match liMatches = liRegex.Match(blah);
if (liMatches.Success)
{
foreach (var group in liMatches.Groups)
{
Console.WriteLine(group);
}
}
Console.ReadLine();
The Regex started much simpler and without the multiline option, but I've been tweaking it to try to make it work.
I want results foo
, bar
and oof
but instead I get <li>foo</li>
and foo
.
On top of this I it seems to work fine in Regex101, https://regex101.com/r/jY6rnz/1
Any thoughts?
Upvotes: 1
Views: 82
Reputation: 27609
I will start by saying that I think as mentioned in comments you should be parsing HTML with a proper HTML parser such as the HtmlAgilityPack. Moving on to actually answer your question though...
The problem is that you are getting a single match because liRegex.Match(blah);
only returns a single match. What you want is liRegex.Matches(blah)
which will return all matches.
So your use would be:
var liMatches = liRegex.Matches(blah);
foreach(Match match in liMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
Upvotes: 3
Reputation: 271565
Your regex produces multiple matches when matched with blah
. The method Match
only returns the first match, which is the foo
one. You are printing all groups in that first match. That will get you 1. the whole match 2. group 1 of the match.
If you want to get foo
and bar
, then you should print group 1 of each match. To do this you should get all the matches using Matches
first. Then iterate over the MatchCollection
and print Groups[1]
:
string blah = @"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = @"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
MatchCollection liMatches = liRegex.Matches(blah);
foreach (var match in liMatches.Cast<Match>())
{
Console.WriteLine(match.Groups[1]);
}
Upvotes: 2