Reputation: 763
(\d+|)
vs (\d+)?
[\w\W]
vs [\d\D]
vs .
Is there any difference between these regular regex? Which one should be chosen?
I'm using Javascript.
Upvotes: 1
Views: 1824
Reputation: 56809
The second one is quite interesting, and I would like to say something about it:
[\w\W]
and [\d\D]
are equivalent, and they are equivalent to [\s\S]
also. \W
is the complement character set for \w
, and the same applies for \D
- \d
pair, and \S
- \s
pair. Therefore, when putting together, they will match any character without exception.
They are usually used when there is no construct to "match any character, without exception". JavaScript is one example of such case. There is also a less known and highly confusing construct to do so in JavaScript [^]
, which is usually invalid in other flavors.
Dot .
generally matches any character, but new line \n
. Depending on language, it may exclude more characters.
For Java, it excludes \n
, \r
, \u0085
, \u2028
, and \u2029
. So a .
is equivalent to [^\n\r\u0085\u2028\u2029]
For JavaScript, dot .
will exclude \r
, \u2028
, and \u2029
in addition to \n
. So .
is equivalent to [^\n\r\u2028\u2029]
Some language will have a mode to make .
matches any character, without exception. It is called DOTALL
mode in Java and Python, SingleLine
mode in C# and Perl.
The behavior of .
varies from language to language. Generally, they all agree that \n
should be excluded in "normal" mode, but they may differ slightly in choosing to exclude more.
Upvotes: 4
Reputation: 46728
[\w\W]
and [\d\D]
are used in languages like JavaScript in which there isn't a dotall option.
It matches all characters, including newlines, unlike .
which matches everything but a newline.
\w\W or \d\D -> matches everything including newline characters
. -> matches everything except newline characters unless
's' (dotall modifier) is specified
(\d+|) or (\d+)? -> matches 1 or more digits OR any position (null)
It could simply be written as '(\d*)'
Upvotes: 5
Reputation: 85767
You didn't say which language you're using, so I'm going to assume Perl.
(\d+|)
is equivalent to (\d*)
. It matches a sequence of 0 or more digits and captures the result into $1
. (\d)?
matches 0 or 1 digit. If it matches a digit, it puts it in $1
; otherwise $1
will be undef
(you could rewrite it as (?:(\d)|)
if you want to eliminate the ?
).
[\w\W]
and [\d\D]
are equivalent, matching any character. .
is equivalent to [^\n]
by default (matching any character but newline). If you really want to match any character, you should use .
and specify the /s
flag, which makes .
match any character.
Upvotes: 2