Reputation: 2574
I'm having performance issues when parsing messages using multiple Regex commands (about 20 in all):
To help efficency I have:
1) Ordered the Regex commands by likeliness.
2) Ensured I break out of the matching loop once a match is found.
I am wondering if there are any other improvements I can make or if there is a better approach to my problem.
Calling Code:
bool resolved = false;
Match regexMatch = null;
foreach (var resolverKvp in _resolvers)
{
if (resolverKvp.Key.Pattern.IsMatch(topicName))
{
regexMatch = resolverKvp.Key.Pattern.Match(topicName);
// Use the kvp value
resolved = true;
break;
}
}
Sample of Regex Commands iterated through:
<add messagename="BackLayVolumeCurrencyOddsFormat" pattern="^.*/M/E_([0-9]+)/MEI/MDP/(\d{1,3})_(\d{1,3})_(\d+)_([a-zA-Z]{3})_([1-3])$" assembly="Client.Messaging"
type="Client.Messaging.TopicMessages.BackLayVolumeCurrencyOddsFormatResolver">
</add>
<add messagename="Market1" pattern="^.*/M/E_([0-9]+)$" assembly="Client.Messaging"
type="Client.Messaging.TopicMessages.Market1Resolver">
</add>
Data Example:
regex 1:
6/E/E_1/E/E_511010/E/E_527901/E/E_631809/E/E_631810/E/E_631811/M/E_1379656/MEI/MDP/10_10_1_USD_3
regex 2:
1/E/E_1/E/E_100004/E/E_190539/E/E_632113/E/E_632120/M/E_1380084
Thank you in Advance.
Upvotes: 2
Views: 130
Reputation: 25563
One thing to try is to avoid .*
in your expressions. In your examples it does not
seem to be needed, and it is not free, especially not if the expression does not
match. A very quick and dirty test showed a factor of two between your pattern 1 and
the equivalent without the ^.*
part.
Furthermore, using .*
more than once in an expression, may lead to catastrophic backtracking.
Upvotes: 1
Reputation: 12544
First (small) noticeable thing, is that the found regex is executed twice: once to check a match, then to find the match. Not sure how much performance difference the ismatch gives, but you could combine the check and find as:
regexMatch = resolverKvp.Key.Pattern.Match(topicName);
if (regexMatch.Success)
{
//etc
Upvotes: 3