Svisstack
Svisstack

Reputation: 16656

Faster replacement for Regex

I have in class around 100 Regex calls, every call cover different type of data in text protocol, but i have many files and based on analytics regex took 88% of execution of my code.

Many this type of code:

{
  Match m_said = Regex.Match(line, @"(.*) said,", RegexOptions.IgnoreCase);
  if (m_said.Success)
  {
    string playername = ma.Groups[1].Value;
    // some action
    return true;
  }
}

{
  Match ma = Regex.Match(line, @"(.*) is connected", RegexOptions.IgnoreCase);
  if (ma.Success)
  {
    string playername = ma.Groups[1].Value;
    // some action
    return true;
  }
}
{
  Match ma = Regex.Match(line, @"(.*): brings in for (.*)", RegexOptions.IgnoreCase);
  if (ma.Success)
  {
    string playername = ma.Groups[1].Value;
    long amount = Detect_Value(ma.Groups[2].Value, line);
    // some action
    return true;
  }
}

Is any way to replace Regex with some other faster solution?

Upvotes: 7

Views: 9247

Answers (5)

Myrtle
Myrtle

Reputation: 5851

I don't know if you can re-use the expressions, or if the method is called multiple times, but if so you should precompile your regular expressions. Try this:

private static readonly Regex xmlRegex = new Regex("YOUR EXPRESSION", RegexOptions.Compiled);

In your sample, each time the method is used it 'compiles' the expression, but this is unneccesary as the expression is a const. Now it is precompiled this compiled only once. Disadvantage is that the first time you access the expression, it is a bit slower.

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336468

Aside from precompiling your regex, you could gain (probably much more) performance benefits by writing a more precise regex. In this respect, .* is almost always a bad choice:

(.*) is connected means: First match the entire string (that's the .* part), then backtrack one character at a time until it's possible to match is connected.

Now unless the string is very short or is connected appears very close to the end of the string, that's a lot of backtracking which costs time.

So if you can refine what an allowed match is, you can improve performance.

For example, if only alphanumeric characters are allowed, then (\w+) is connected will be good. If it's any kind of non-whitespace characters, then use (\S+) is connected. Etc., depending on the rules for a valid match.

In your concrete example, you don't appear to be doing anything with the captured match, so you could even drop regex altogether and just look for a fixed substring. Which method will be the fastest in the end depends a lot on your actual input and requirements.

Upvotes: 4

Seki
Seki

Reputation: 11465

For regexps that are tested in loop, it is often faster to precompile them outside of the loop and just test them inside of the loop.

You need to declare the different regexps first with their respective patterns and only call the Match() with the text to test in a second step.

Upvotes: 8

Guillaume Slashy
Guillaume Slashy

Reputation: 3624

I know Regex can do a lot of things but here is a benchmark with Regex vs char.Split vs string.split

http://www.dotnetperls.com/split in the Benchmarks section

Upvotes: 1

adelphus
adelphus

Reputation: 10316

You could try compiling the Regex beforehand or consider combining all the individual Regex expressions into one (monster) Regex:

Match m_said = Regex.Match(line,
            @"(.*) (said|(is connected)|...|...),",
            RegexOptions.IgnoreCase);

You can then test the second capturing group to determine which type of match occurred.

Upvotes: 1

Related Questions