user1990553
user1990553

Reputation: 763

What's the difference between these RegEx

  1. (\d+|) vs (\d+)?
  2. [\w\W] vs [\d\D] vs .

Is there any difference between these regular regex? Which one should be chosen?

I'm using Javascript.

Upvotes: 1

Views: 1824

Answers (3)

nhahtdh
nhahtdh

Reputation: 56809

The second one is quite interesting, and I would like to say something about it:

  • [\w\W] and [\d\D] are equivalent, and they are equivalent to [\s\S] also. \W is the complement character set for \w, and the same applies for \D - \d pair, and \S - \s pair. Therefore, when putting together, they will match any character without exception.

    They are usually used when there is no construct to "match any character, without exception". JavaScript is one example of such case. There is also a less known and highly confusing construct to do so in JavaScript [^], which is usually invalid in other flavors.

  • Dot . generally matches any character, but new line \n. Depending on language, it may exclude more characters.

    For Java, it excludes \n, \r, \u0085, \u2028, and \u2029. So a . is equivalent to [^\n\r\u0085\u2028\u2029]

    For JavaScript, dot . will exclude \r, \u2028, and \u2029 in addition to \n. So . is equivalent to [^\n\r\u2028\u2029]

    Some language will have a mode to make . matches any character, without exception. It is called DOTALL mode in Java and Python, SingleLine mode in C# and Perl.

The behavior of . varies from language to language. Generally, they all agree that \n should be excluded in "normal" mode, but they may differ slightly in choosing to exclude more.

Upvotes: 4

Anirudh Ramanathan
Anirudh Ramanathan

Reputation: 46728

[\w\W] and [\d\D] are used in languages like JavaScript in which there isn't a dotall option. It matches all characters, including newlines, unlike . which matches everything but a newline.

   \w\W or \d\D   -> matches everything including newline characters
              .   -> matches everything except newline characters unless 
                     's' (dotall modifier) is specified 
(\d+|) or (\d+)?  -> matches 1 or more digits OR any position (null)
                     It could simply be written as '(\d*)'

Upvotes: 5

melpomene
melpomene

Reputation: 85767

You didn't say which language you're using, so I'm going to assume Perl.

  1. (\d+|) is equivalent to (\d*). It matches a sequence of 0 or more digits and captures the result into $1. (\d)? matches 0 or 1 digit. If it matches a digit, it puts it in $1; otherwise $1 will be undef (you could rewrite it as (?:(\d)|) if you want to eliminate the ?).

  2. [\w\W] and [\d\D] are equivalent, matching any character. . is equivalent to [^\n] by default (matching any character but newline). If you really want to match any character, you should use . and specify the /s flag, which makes . match any character.

Upvotes: 2

Related Questions