Reputation: 57
i have try to use that regex /^(\S+)(?:\?$|$)/
with yolo
and yolo?
works with both but on the second string (yolo?
) the ?
will be include on the capturing group (\S+)
.
It's a bug of regex or i have made some mistake?
edit: i don't want that the '?' included on the capturing group. Sry for my bad english.
Upvotes: 3
Views: 96
Reputation: 70750
It is doing that because \S
matches any non-white space character and it is being greedy.
Following the +
quantifier with ?
for a non-greedy match will prevent this.
^(\S+?)\??$
Or use \w
here which matches any word character.
^(\w+)\??$
Upvotes: 2
Reputation: 9644
You can use
If what you want to capture can't have a ?
in it, use a negated character class [^...]
(see demo here):
^([^\s?]+)\??$
If what you want to capture can have ?
in it (for example, yolo?yolo?
and you want
yolo?yolo
), you need to make your quantifier +
lazy by adding ?
(see demo here):
^(\S+?)\??$
There is BTW no need for a capturing group here, you can use a look ahead (?=...)
instead and look at the whole match (see demo here):
^[^\s?]+(?=\??$)
What was happening
The rules are: quantifiers (like +
) are greedy by default, and the regex engine will return the first match it finds.
Considers what this means here:
\S+
will first match everything in yolo?
, then the engine will try to match (?:\?$|$)
. \?$
fails (we're already at the end of the string, so we now try to match an empty string and there's no ?
left), but $
matches.The regex has succesfully reached its end, the engine returns the match where \S+
has matched all the string and everything is in the first capturing group.
To match what you want you have to make the quantifier lazy (+?
), or prevent the character class (yeah, \S
is a character class) from matching your ending delimiter ?
(with [^\s?]
for example).
Upvotes: 4
Reputation: 174844
The below regex would capture all the non space characters followed by an option ?
,
^([\S]+)\??$
OR
^([\w]+)\??$
If you use \S+
, it matches even the ?
character also. So to seperate word and non word character you could use the above regex. It would capture only the word characters and matches the optional ?
which is follwed by one or more word characters.
Upvotes: 2
Reputation: 7835
This is the correct response as \S+
matches one or more non-whitespace characters greedily, of which ?
is one.
thus the question mark is matched in the (\S+)
group and the non-capturing group resolves to $
you could make it work as you expect by making the match non-greedy with:
/^(\S+?)(?:\?$|$)/
alternatively you could restrict the character group:
/^([^\s?]+)(?:\?$|$)/
Upvotes: 2