Zabba
Zabba

Reputation: 65467

Is there a common/standard subset of Regular Expressions?

Do the "control characters" used in regular expressions differ a lot among different implementations of regex parsers (eg. regex in Ruby, Java, C#, sed etc.).

For example, in Ruby, the \D means not a digit; does it mean the same in Java, C# and sed? I guess what I'm asking is, is there a "standard" for regex'es that all regex parsers support?

If not, is there some common subset that should be learned and mastered (and then learn the parser-specific ones as they're encountered) ?

Upvotes: 8

Views: 789

Answers (2)

DigitalRoss
DigitalRoss

Reputation: 146053

There is a common core which is very simple. It corresponds to the regular expressions as implemented in the original software tools such as ed, grep, sed, and awk. This is worth learning, because the other formats are all supersets of this one.

.        match any character
[abc]    match a, b, or c
[^abc]   match a character other than a, b, or c
[a-c]    match the range from a to c
^        match the begininning of the line
$        match the end of the line
*        match zero or more of the preceding character
\(...\)  group for use as a back-reference 

† I've left out Posix bracket expressions because no one uses them and they aren't in the subset. The parens are by default magic except in the classic expressions.

Upvotes: 1

Oded
Oded

Reputation: 499002

See the list of basic syntax on regular-expressions.info.

And a comparison of the different "flavors".

Upvotes: 8

Related Questions