Reputation: 16656
I have in class around 100 Regex
calls, every call cover different type of data in text protocol, but i have many files and based on analytics regex
took 88% of execution of my code.
Many this type of code:
{
Match m_said = Regex.Match(line, @"(.*) said,", RegexOptions.IgnoreCase);
if (m_said.Success)
{
string playername = ma.Groups[1].Value;
// some action
return true;
}
}
{
Match ma = Regex.Match(line, @"(.*) is connected", RegexOptions.IgnoreCase);
if (ma.Success)
{
string playername = ma.Groups[1].Value;
// some action
return true;
}
}
{
Match ma = Regex.Match(line, @"(.*): brings in for (.*)", RegexOptions.IgnoreCase);
if (ma.Success)
{
string playername = ma.Groups[1].Value;
long amount = Detect_Value(ma.Groups[2].Value, line);
// some action
return true;
}
}
Is any way to replace Regex
with some other faster solution?
Upvotes: 7
Views: 9247
Reputation: 5851
I don't know if you can re-use the expressions, or if the method is called multiple times, but if so you should precompile your regular expressions. Try this:
private static readonly Regex xmlRegex = new Regex("YOUR EXPRESSION", RegexOptions.Compiled);
In your sample, each time the method is used it 'compiles' the expression, but this is unneccesary as the expression is a const. Now it is precompiled this compiled only once. Disadvantage is that the first time you access the expression, it is a bit slower.
Upvotes: 2
Reputation: 336468
Aside from precompiling your regex, you could gain (probably much more) performance benefits by writing a more precise regex. In this respect, .*
is almost always a bad choice:
(.*) is connected
means: First match the entire string (that's the .*
part), then backtrack one character at a time until it's possible to match is connected
.
Now unless the string is very short or is connected
appears very close to the end of the string, that's a lot of backtracking which costs time.
So if you can refine what an allowed match is, you can improve performance.
For example, if only alphanumeric characters are allowed, then (\w+) is connected
will be good. If it's any kind of non-whitespace characters, then use (\S+) is connected
. Etc., depending on the rules for a valid match.
In your concrete example, you don't appear to be doing anything with the captured match, so you could even drop regex altogether and just look for a fixed substring. Which method will be the fastest in the end depends a lot on your actual input and requirements.
Upvotes: 4
Reputation: 11465
For regexps that are tested in loop, it is often faster to precompile them outside of the loop and just test them inside of the loop.
You need to declare the different regexps first with their respective patterns and only call the Match()
with the text to test in a second step.
Upvotes: 8
Reputation: 3624
I know Regex can do a lot of things but here is a benchmark with Regex vs char.Split vs string.split
http://www.dotnetperls.com/split in the Benchmarks section
Upvotes: 1
Reputation: 10316
You could try compiling the Regex beforehand or consider combining all the individual Regex expressions into one (monster) Regex:
Match m_said = Regex.Match(line,
@"(.*) (said|(is connected)|...|...),",
RegexOptions.IgnoreCase);
You can then test the second capturing group to determine which type of match occurred.
Upvotes: 1