performance issues with finding nth occurence of a character with a regular expression

Question

I have a regex to find the nth occurrence of a character in a string, here's the code:

 public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");

        if (m.Success)
        {
            return m.Groups[2].Captures[n - 1].Index;
        }
        else
        {
             return -1;
        }
    }

Now, I have 1594 entries in this string, with 1593 semicolons. If I write:

tempstring.NthIndexOf(";", 1593)

The answer comes back immediately and correctly. If I give it anything over 1594 it hangs. Does anyone know how to fix this?

Test Case

 string holder = "test;test2;test3";
        string test = "";
        for (int i = 0; i < 600; i++)
        {
            test += holder;
        }
        int index = test.NthIndexOf(";", 2000);

This takes a very long time. Change 600 to 6 and it is very fast. Make 2000 to 1700 and it is very fast as well.

Why is my regular expression so slow?

Daniel LeCheminant · Accepted Answer

If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like

public static int NthIndexOf(this string target, char testChar, int n)
{
   int count = 0;

   for(int i=0; i



and use that.  It should have far fewer limitations.

As for why your original regex is going slow, here's what I suspect:


For your fast case, it's working because it can find a match on it's first pass through (with each group matching exactly one character)
For the slow case is because it can't find a match (and won't ever find one, because there aren't enough semicolons to satisfy the regex), but it  recursively tries every possible way to break up the string (which is a really big operation)

performance issues with finding nth occurence of a character with a regular expression

Test Case

Answers (2)

Related Questions