Keltere
Keltere

Reputation: 57

Regex group include if condition

i have try to use that regex /^(\S+)(?:\?$|$)/

with yolo and yolo?

works with both but on the second string (yolo?) the ? will be include on the capturing group (\S+).

It's a bug of regex or i have made some mistake?

edit: i don't want that the '?' included on the capturing group. Sry for my bad english.

Upvotes: 3

Views: 96

Answers (5)

hwnd
hwnd

Reputation: 70750

It is doing that because \S matches any non-white space character and it is being greedy.

Following the + quantifier with ? for a non-greedy match will prevent this.

^(\S+?)\??$

Or use \w here which matches any word character.

^(\w+)\??$

Upvotes: 2

Robin
Robin

Reputation: 9644

You can use

  • If what you want to capture can't have a ? in it, use a negated character class [^...] (see demo here):

    ^([^\s?]+)\??$
    
  • If what you want to capture can have ? in it (for example, yolo?yolo? and you want yolo?yolo), you need to make your quantifier + lazy by adding ? (see demo here):

    ^(\S+?)\??$
    
  • There is BTW no need for a capturing group here, you can use a look ahead (?=...) instead and look at the whole match (see demo here):

    ^[^\s?]+(?=\??$)
    

What was happening

The rules are: quantifiers (like +) are greedy by default, and the regex engine will return the first match it finds.

Considers what this means here:

  • \S+ will first match everything in yolo?, then the engine will try to match (?:\?$|$).
  • \?$ fails (we're already at the end of the string, so we now try to match an empty string and there's no ? left), but $ matches.

The regex has succesfully reached its end, the engine returns the match where \S+ has matched all the string and everything is in the first capturing group.

To match what you want you have to make the quantifier lazy (+?), or prevent the character class (yeah, \S is a character class) from matching your ending delimiter ? (with [^\s?] for example).

Upvotes: 4

Avinash Raj
Avinash Raj

Reputation: 174844

The below regex would capture all the non space characters followed by an option ?,

^([\S]+)\??$

DEMO

OR

^([\w]+)\??$

DEMO

If you use \S+, it matches even the ? character also. So to seperate word and non word character you could use the above regex. It would capture only the word characters and matches the optional ? which is follwed by one or more word characters.

Upvotes: 2

Toto
Toto

Reputation: 91518

Make the + non greedy:

^(\S+?)\??$

Upvotes: 2

Mike H-R
Mike H-R

Reputation: 7835

This is the correct response as \S+ matches one or more non-whitespace characters greedily, of which ? is one.

thus the question mark is matched in the (\S+) group and the non-capturing group resolves to $ you could make it work as you expect by making the match non-greedy with:

/^(\S+?)(?:\?$|$)/

demo

alternatively you could restrict the character group:

/^([^\s?]+)(?:\?$|$)/

demo

Upvotes: 2

Related Questions