Reputation: 7680
I need to match the following patterns:
#/a/b/c/d
#/a/b/&1/d
#/a/b/c[&1]/d
The following rules apply:
a) # is the number sign and then its a path. Pretty much anything can be in the path segments. For &1 and []'s, they follow certain rules.
b) &1 (or any number) has to be in a path segment by itself
c) [&1] has to follow at least one character and has to end the segment, only [&l1] is allowed for now
So, I came up with the following:
^#((/[^/&]+)|(/&\\d+)|(/[^/]+\\[&1\\]))+
Seems to be working, but my profiler is showing that its a bottleneck. Is there a way to improve performance or restructure it in a more optimized way? I don't need to capture or group anything, I just need to know if its a valid path.
Upvotes: 2
Views: 76
Reputation: 26907
Running some quick tests, this is the fastest. I added some brackets to the negative character classes to exclude paths that have extraneous brackets in them. It is faster without them.
var pattern = "^#(?:/(?:&\\d+|[^/&[\\]]+\\[&1]|[^/&[\\]]+))+$";
var REc = new Regex(pattern, RegexOptions.Compiled);
Changing the order depending on what type of segment is most frequent can be faster - this is faster on my test data that mostly has alphanumeric segments:
var pattern2 = "^#(?:/(?:[^/&[\\]]+|&\\d+|[^&/[\\]]+\\[&1]))+$";
Tested using REc.IsMatch(bs)
This is faster if brackets are okay in segments:
var pattern = "^#(?:/(?:&\\d+|[^/]+\\[&1]|[^/&]+))+$";
Upvotes: 2
Reputation: 520968
One thing you could try would be to tell the regex engine not to capture anything:
^#(?:(?:/[^/&]+)|(?:/&\\d+)|(?:/[^/]+\\[&1\\]))+
By marking each group (?: ... )
we tell the engine to ignore this group.
Upvotes: 0