Richard
Richard

Reputation: 8280

Why does this regex work in JavaScript, but not C#?

Expression

var regex = new Regex(@"{([A-z]*)(([^]|:)((\\:)|[^:])*?)(([^]|:)((\\:)|[^:])*?)}");

Breakdown

The expression is [crudely] designed to find tokens within an input, using the format: {name[:pattern[:format]]}, where the pattern and format are optional.

{
  ([A-z]*) // name
  (([^]|:)((\\:)|[^:])*?) // regex pattern
  (([^]|:)((\\:)|[^:])*?) // format
}

Additionally, the expression attempts to ignore escaped colons, thus allowing for strings such as {Time:\d+\:\d+\:\d+:hh\:mm\:ss}

Question

When testing on RegExr.com, everything works sufficiently, however when attempting the same pattern in C#, the input fails to match, why?

(Any advice for general improvements to the expression are very welcome too)

Upvotes: 4

Views: 138

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

The [^] pattern is only valid in JavaScript where it matches a not nothing, i.e. any character (although in ES5, it does not match the chars from outside the BMP plane). In C#, it is easy to match any char with . and passing the RegexOptions.Singleline modifier. However, in JS, the modifier is not supported, but you may match any char with [\s\S] workaround pattern.

So, the minimum change you need to make to make both compatible in both regex flavors is to change ([^]|:) to [\s\S] because there is no need to use a : as an alternative (since [\s\S] will already match a colon).

Also, do not use [A-z] as a shortcut to match ASCII letters. Either use [a-zA-Z] or [a-z] and pass a case insensitive modifier.

So, you might consider writing the expression as

{([A-Za-z]*)([\s\S]((\\:)|[^:])*?)([\s\S]((\\:)|[^:])*?)}

See a .NET regex test and a JS regex test.

Surely, there may be other enhancements here: remove redundant groups, add support for any escape sequences (not just escaped colons), etc., but it is out of the question scope.

Upvotes: 6

Related Questions