PzYon
PzYon

Reputation: 2993

Performance and readability of RegEx using positive look ahead

I am validating following strings with regular expressions in C#:

[/1/2/]
[/1/2/];[/3/4/5/]
[/1/22/333/];[/1/];[/9999/]

Basically it's one or more group of square brackets separated by semi-colon (but not at the end). Each group consists out of one or more numbers seperated by slashes. There are no other characters allowed.

These are two alternatives:

^(\[\/(\d+\/)+\](;(?=\[)|$))+$

^(\[\/(\d+\/)+\];)*(\[\/(\d+\/)+\])$

The first version uses a positive look ahead and the second version duplicates part of the pattern.

Both RegEx-es seem to be ok, do what they should and aren't very nice to read. ;)

Does anybody have an idea for a better, faster and more easy to read solution? When I was playing around in regex101 I realized that the second version uses more steps, why?

At the same time I realized that it would be nice to count the steps used in a C#-RegEx. Is there any way to achieve this?

Upvotes: 3

Views: 179

Answers (2)

user1919238
user1919238

Reputation:

There is nothing particularly wrong with the two options you suggest. They are not that complicated as regexes go, and they should be understandable enough, as long as you put an appropriate comment in your code.

In general, I think it is preferable to avoid look-arounds, unless they are necessary or greatly simplify the regex--they make it harder to figure out what is going on, since they add a non-linear element to the logic.

The relative performance of regexes this simple is not something to worry about, unless you are performing a huge number of operations or discover a performance problem with your code. Still, understanding the relative performance of different patterns may be instructive.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use 1 regex to validate all these strings:

^\[/(\d+/)+\](?:;\[/(\d+/)+\])*$

See regex demo

To make it easier to read, use a VERBOSE flag (inline (?x) or RegexOptions.IgnorePatternWhitespace):

var rx = @"(?x)^               # Start of string
           \[/             # Literal `[/`
           (\d+/)+         # 1 or more sequences of 1 or more digits followed by `/`
           \]              # Closing `]`
           (?:             # A non-capturing group start
             ;             # a semi-colon delimiter
              \[/(\d+/)+\] # Same as the first part of the regex
           )*              # 0 or more occurrences
           $               # End of string
";

To test a .NET regex performance (not the number of steps), you can use a regexhero.net service. With the 3 sample strings above, my regex shows 217K iterations per second speed, which is more than either of your regexps.

Upvotes: 2

Related Questions