SledgeHammer
SledgeHammer

Reputation: 7680

Can this Regex be optimized?

I need to match the following patterns:

#/a/b/c/d
#/a/b/&1/d
#/a/b/c[&1]/d

The following rules apply:

a) # is the number sign and then its a path. Pretty much anything can be in the path segments. For &1 and []'s, they follow certain rules.
b) &1 (or any number) has to be in a path segment by itself
c) [&1] has to follow at least one character and has to end the segment, only [&l1] is allowed for now

So, I came up with the following:

^#((/[^/&]+)|(/&\\d+)|(/[^/]+\\[&1\\]))+

Seems to be working, but my profiler is showing that its a bottleneck. Is there a way to improve performance or restructure it in a more optimized way? I don't need to capture or group anything, I just need to know if its a valid path.

Upvotes: 2

Views: 76

Answers (2)

NetMage
NetMage

Reputation: 26907

Running some quick tests, this is the fastest. I added some brackets to the negative character classes to exclude paths that have extraneous brackets in them. It is faster without them.

var pattern = "^#(?:/(?:&\\d+|[^/&[\\]]+\\[&1]|[^/&[\\]]+))+$";
var REc = new Regex(pattern, RegexOptions.Compiled);

Changing the order depending on what type of segment is most frequent can be faster - this is faster on my test data that mostly has alphanumeric segments:

var pattern2 = "^#(?:/(?:[^/&[\\]]+|&\\d+|[^&/[\\]]+\\[&1]))+$";

Tested using REc.IsMatch(bs)

This is faster if brackets are okay in segments:

var pattern = "^#(?:/(?:&\\d+|[^/]+\\[&1]|[^/&]+))+$";

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

One thing you could try would be to tell the regex engine not to capture anything:

^#(?:(?:/[^/&]+)|(?:/&\\d+)|(?:/[^/]+\\[&1\\]))+

By marking each group (?: ... ) we tell the engine to ignore this group.

Upvotes: 0

Related Questions