Magda Furman
Magda Furman

Reputation: 21

Same regular expression executed in different running time in .NET

I am working on a project where I am heavily using regexes. The regular expressions that I am using are quite complicated and I have to set an appropriate timeout to stop the execution, so that it doesn't try to match the string for long time.

The problem is that I have noticed that running the same regular expression (compiled) on the same string is being executed with different running times, varying from 17ms to 59ms.

Do you have any idea why it is the case? I am measuring the run time using Stopwatch like this:

for (int i = 0; i < 15; i++)
{
    sw.Start();
    regex.IsMatch(message);
    sw.Stop();
    Debug.WriteLine(sw.ElapsedMilliseconds);
    sw.Reset();
}

For reference I am using the default regular expressions library from .NET in System.Text.RegularExpressions.


According to the comments, I modified the code in the following way:

List<long> results = new List<long>(); 
for (int i = 0; i < 150; i++)
{
    sw.Start();
    for (int j = 0; j < 20; j++ )
    { 
        regex.IsMatch(message);
    }
    sw.Stop();
    results.Add(sw.ElapsedMilliseconds);
    sw.Reset();
}
Debug.WriteLine(results.Max());
Debug.WriteLine(results.Average());
Debug.WriteLine(results.Min());

and the output for this was:

790
469,086666666667
357

Still the difference is very significant for me.

Upvotes: 2

Views: 139

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626929

Since you say you are using RegexOptions.Compiled, please refer to the regex performance tips from David Gutierrez's blog:

In this case, we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that's not memory that you'll ever get back... The bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.

That means that running the regex match first time, this additional work ("compile time") is performed, and all subsequent times the regex is executed without that preparation.

However, beginning with .NET 2.0, the behavior of caching has modified a bit:

In the .NET Framework 2.0, only regular expressions used in static method calls are cached. By default, the last 15 regular expressions are cached, although the size of the cache can be adjusted by setting the value of the CacheSize property.

Upvotes: 2

comdiv
comdiv

Reputation: 951

It's common situation for any managed platform Java/.NET - while they do some things behind the scene GC for example, and while we use concurent OS-es (win, linux) such tests are not exactly measeare. You think that you are testing regex itself - but you test .NET, Windows, and your antivirus at same time too.

One valid way is execute regex for 50-1000 times, summarize time and eval average duration. For example rewrite:

 sw.Start();
 for (int i = 0; i < 1000; i++)
 {  
     regex.IsMatch(message);
 }
 sw.Stop();
  Debug.WriteLine(sw.ElapsedMilliseconds / 1000); 

and i think you result will be much stable. But you still will get some range of values for ex [15ms .. 18ms], and that is described upper.

If you want really perfect measure (but your question... sory man... show that you not really want it). You require to use PROFILER that will give you exactly measure of time inside regex call without anything except it.

Upvotes: 1

Related Questions