dexteran
dexteran

Reputation: 95

What is the purpose of ?: and .*? before \K in regular expressions?

I have a regular expression that matches words with . in between them as potential urls but not those with @ in front of them as they are assumed to be emails.

This is the regex that I have:

(?:\@(https?:\/\/)?(\w+(\-*\w+)*\.)[a-zA-Z\.]+[\w+\/?\#?\??\=\%\&\-]+.*?)*\K(https?:\/\/)?(\w+(\-*\w+)*\.)[a-zA-Z\.]+[\w+\/?\#?\??\=\%\&\-]+

This is not working for the last occurrence of email perfectly.

For example, for the string

twitter.com facebook.com [email protected] [email protected] [email protected] [email protected] john wayne <[email protected]> 20,000.00

I expect the matches to be twitter.com and facebook.com.

But it also matches dc.com.

Upvotes: 0

Views: 309

Answers (1)

K.Dᴀᴠɪs
K.Dᴀᴠɪs

Reputation: 10139

In your (?:\@(https?:\/\/), the ? in https?: will match either http or https. The ? literally means 0 or 1 of the character s. The : you refer to in https?: is matching a literal :, nothing special.

Now, the difference is if your ?: comes after a non-escaped opening parenthesis, then that means it's a non-capturing group.

Escaped: \(?:, not a non-capturing group
Not-Escaped: (?:, is a non-capturing group


The next portion of your question, what does the .*? in [\w+\/?\#?\??\=\%\&\-]+.*? refer to?

  • . will match any character
  • * is a quantifier that will match your . (any character) 0 to unlimited times
  • *? makes * non-greedy. An internet search will provide you with a lot of information on what a non-greedy match is if you are unaware.

Upvotes: 4

Related Questions