Dgameman1
Dgameman1

Reputation: 378

C# Regex capturing everything

I want to only have the text inbetween the parenthesis but for some reason it's giving me the whole thing

This is the regex I wrote

<a href='ete(.+)'>det

This is the string

</td>
<td>
<a href='ete/d1460852470.html'>detailed list #11</a> (20.94KB)
</td>
<td>
392
</td>
<td>
4/17 12:21:10 am
</td>
</tr>
<tr>
<td>
<a href='ete/1460845272.html'>ete #5</a> (6.71KB)
</td>
<td>
<a href='ete/d1460845272.html'>detailed list #5</a> (19.76KB)
</td>
<td>
372
</td>
<td>
4/16 10:21:12 pm
</td>
</tr>
<tr>
<td>
<a href='ete/1460839272.html'>ete #2</a> (6.62KB)
</td>
<td>
<a href='ete/d1460839272.html'>detailed list #2</a> (19.4KB)
</td>
<td>
366
</td>
<td>
4/16 8:41:12 pm
</td>
</tr>
<tr>
<td>
<a href='ete/1460830870.html'>ete #8</a> (6.72KB)
</td>
<td>
<a href='ete/d1460830870.html'>detailed list #8</a> (19.76KB)
</td>

I only want the text between / and '

But that doesn't happen right now. I get back a 3 dimensional array.

This is the code that https://myregextester.com/index.php produces

      String sourcestring = "source string to match with pattern";
      Regex re = new Regex(@"<a href='ete(.+)'>det");
      MatchCollection mc = re.Matches(sourcestring);
      int mIdx=0;
      foreach (Match m in mc)
       {
        for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
          {
            Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
          }
        mIdx++;
      }

Upvotes: 0

Views: 31

Answers (2)

yaakov
yaakov

Reputation: 5851

Your answer is in your one of the match groups already - m[n].Groups[1] will give you just your capture group. m[n].Groups[0] will give you all the text that matched your regular expression, not just your capture group.

If you want to be pedantic, you can switch to a lookahead and lookbehind, e.g. (?<=<a href='ete).+(?='>det), to only match the inner text.

Upvotes: 0

Jens Meinecke
Jens Meinecke

Reputation: 2940

Change the regex to:

Regex re = new Regex(@"<a href='ete([^']+)'>det");

and you should get what you are after.

It's saying match all the characters that are not the closing quote in the group and then match the '>det after that.

Upvotes: 1

Related Questions