Reputation: 23
Can you help me understand what the following regexp means:
(?:.*? rv:([\w.]+))?
So,
(?: //the pattern must be in a string, but doesn't return
. //any Unicode character except newline
* //zero or more times
? //zero or one time (how is *? different from just *)
rv: //just "rv:" apparently
[\w //any digit, an underscore, or any Latin-1 letter character
.] //...or any unicode character (are Latin-1 characters not Unicode?)
..))? //all that zero or one time
It's from "The Definitive Guide" and I hate that book. Some examples of what does and doesn't match the regexp would be much appreciated.
Upvotes: 1
Views: 112
Reputation: 121710
The regex is:
(?: # begin non capturing group
.*? # any character, zero or more times, but peek and stop if the next char is
# a space (" "); then look for
rv: # literal "rv:", followed by
( # begin capturing group
[\w.] # any word character or a dot (the dot HAS NO special meaning in a character class),
+ # once or more,
) # end capturing group
) # end non capturing group
? # zero or one time
*?
is what is called a lazy quantifier, it forces the regex engine to peek the next character before swallowing a character -- it is used, overused and abused, and this is one case: since the next character is a literal space, it must be replaced with [^ ]*
(anything which is NOT a space, zero or more times) which avoids the lookahead altogether.
Definitive. Right.
Upvotes: 2