Totero
Totero

Reputation: 2574

Performance Issues when Parsing Messages using multiple Regex Commands

I'm having performance issues when parsing messages using multiple Regex commands (about 20 in all):

To help efficency I have:

1) Ordered the Regex commands by likeliness.

2) Ensured I break out of the matching loop once a match is found.

I am wondering if there are any other improvements I can make or if there is a better approach to my problem.

Calling Code:

        bool resolved = false;
        Match regexMatch = null;

        foreach (var resolverKvp in _resolvers)
        {
            if (resolverKvp.Key.Pattern.IsMatch(topicName))
            {
                regexMatch = resolverKvp.Key.Pattern.Match(topicName);
                //  Use the kvp value
                resolved = true;
                break;
            }
        }

Sample of Regex Commands iterated through:

    <add messagename="BackLayVolumeCurrencyOddsFormat" pattern="^.*/M/E_([0-9]+)/MEI/MDP/(\d{1,3})_(\d{1,3})_(\d+)_([a-zA-Z]{3})_([1-3])$" assembly="Client.Messaging"
      type="Client.Messaging.TopicMessages.BackLayVolumeCurrencyOddsFormatResolver">
    </add>

    <add messagename="Market1" pattern="^.*/M/E_([0-9]+)$" assembly="Client.Messaging"
      type="Client.Messaging.TopicMessages.Market1Resolver">
    </add>

Data Example:

regex 1:
6/E/E_1/E/E_511010/E/E_527901/E/E_631809/E/E_631810/E/E_631811/M/E_1379656/MEI/MDP/10_10_1_USD_3

regex 2:
1/E/E_1/E/E_100004/E/E_190539/E/E_632113/E/E_632120/M/E_1380084

Thank you in Advance.

Upvotes: 2

Views: 130

Answers (2)

Jens
Jens

Reputation: 25563

One thing to try is to avoid .* in your expressions. In your examples it does not seem to be needed, and it is not free, especially not if the expression does not match. A very quick and dirty test showed a factor of two between your pattern 1 and the equivalent without the ^.* part.

Furthermore, using .* more than once in an expression, may lead to catastrophic backtracking.

Upvotes: 1

Me.Name
Me.Name

Reputation: 12544

First (small) noticeable thing, is that the found regex is executed twice: once to check a match, then to find the match. Not sure how much performance difference the ismatch gives, but you could combine the check and find as:

regexMatch = resolverKvp.Key.Pattern.Match(topicName);
if (regexMatch.Success)
{
      //etc

Upvotes: 3

Related Questions