Reputation: 12004
I have a regex to find the nth
occurrence of a character in a string, here's the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
{
return m.Groups[2].Captures[n - 1].Index;
}
else
{
return -1;
}
}
Now, I have 1594 entries in this string, with 1593 semicolons. If I write:
tempstring.NthIndexOf(";", 1593)
The answer comes back immediately and correctly. If I give it anything over 1594 it hangs. Does anyone know how to fix this?
string holder = "test;test2;test3";
string test = "";
for (int i = 0; i < 600; i++)
{
test += holder;
}
int index = test.NthIndexOf(";", 2000);
This takes a very long time. Change 600 to 6 and it is very fast. Make 2000 to 1700 and it is very fast as well.
Why is my regular expression so slow?
Upvotes: 0
Views: 1469
Reputation: 51081
If you're really only looking for character repetitions, and not string repetitions, then you should be able to replace you method with something simple like
public static int NthIndexOf(this string target, char testChar, int n)
{
int count = 0;
for(int i=0; i<target.Length; i++)
{
if(target[i] == testChar)
{
count++;
if(count == n) return i;
}
}
return -1;
}
and use that. It should have far fewer limitations.
As for why your original regex is going slow, here's what I suspect:
Upvotes: 9
Reputation: 655379
Try to use a more distinct and efficient regular expression:
"^(?:[^" + value + "]*" + value + "){" + (n - 1) + "}([^" + value + "]*)
This will build the following regular expression for tempstring.NthIndexOf(";", 1593)
:
^(?:[^;]*;){1592}([^;]*)
But this will only work for single characters as separator.
Another approach would be to step through each character and count the occurences of the character you were looking for.
Upvotes: 2