Reputation: 378
I want to only have the text inbetween the parenthesis but for some reason it's giving me the whole thing
This is the regex I wrote
<a href='ete(.+)'>det
This is the string
</td>
<td>
<a href='ete/d1460852470.html'>detailed list #11</a> (20.94KB)
</td>
<td>
392
</td>
<td>
4/17 12:21:10 am
</td>
</tr>
<tr>
<td>
<a href='ete/1460845272.html'>ete #5</a> (6.71KB)
</td>
<td>
<a href='ete/d1460845272.html'>detailed list #5</a> (19.76KB)
</td>
<td>
372
</td>
<td>
4/16 10:21:12 pm
</td>
</tr>
<tr>
<td>
<a href='ete/1460839272.html'>ete #2</a> (6.62KB)
</td>
<td>
<a href='ete/d1460839272.html'>detailed list #2</a> (19.4KB)
</td>
<td>
366
</td>
<td>
4/16 8:41:12 pm
</td>
</tr>
<tr>
<td>
<a href='ete/1460830870.html'>ete #8</a> (6.72KB)
</td>
<td>
<a href='ete/d1460830870.html'>detailed list #8</a> (19.76KB)
</td>
I only want the text between /
and '
But that doesn't happen right now. I get back a 3 dimensional array.
This is the code that https://myregextester.com/index.php produces
String sourcestring = "source string to match with pattern";
Regex re = new Regex(@"<a href='ete(.+)'>det");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
Upvotes: 0
Views: 31
Reputation: 5851
Your answer is in your one of the match groups already - m[n].Groups[1]
will give you just your capture group. m[n].Groups[0]
will give you all the text that matched your regular expression, not just your capture group.
If you want to be pedantic, you can switch to a lookahead and lookbehind, e.g. (?<=<a href='ete).+(?='>det)
, to only match the inner text.
Upvotes: 0
Reputation: 2940
Change the regex to:
Regex re = new Regex(@"<a href='ete([^']+)'>det");
and you should get what you are after.
It's saying match all the characters that are not the closing quote in the group and then match the '>det
after that.
Upvotes: 1