Julian Gold
Julian Gold

Reputation: 1286

Regular expression oddity

I have a regular expression to check for valid identifiers in a script language. These start with a letter or underscore, and can be followed by 0 or more letters, underscores, digits and $ symbols. However, if I call

Util.IsValidIdentifier( "hello\n" );

it returns true. My regex is

const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_\$]*$";

so how does the "\n" get through?

Upvotes: 4

Views: 96

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Your result is true with hello\n because you don't need to escape the $ inside a character class, thus the backslash is matched because you have a backslash (seen as literal) inside the character class.

Try this:

const string IDENTIFIER_REGEX = @"^[A-Za-z_][A-Za-z0-9_$]*$";

Since you are testing variable names that are in one line, you can use $ as end of the string.

Upvotes: 0

ctn
ctn

Reputation: 2930

The $ matches the end of lines. You need to use \z to match the end of the text, along with RegexOptions.Multiline. You might also want to use \A instead of ^ to match the beginning of the text, not of the line.

Also, you don't need to escape the $ in the character class.

Upvotes: 5

Jack
Jack

Reputation: 133567

Because $ is a valid metacharacter which means the end of the string (or the end of the line, just before the newline). From msdn:

$: The match must occur at the end of the string or before \n at the end of the line or string.

You should escape it: \$ (and add \z if you want to match the end of the string there).

Upvotes: 1

Related Questions