Rafael Almeida
Rafael Almeida

Reputation: 5240

Regex only matches last occurrence reading left to right

I have this string.

Votre vol : Casablanca - Paris Mercredi 31 août 2016 AF1197 - Ecoi 7 septembre 2016 AF1196 - Economy 15:20 Paris , Charles de Gaulle (CDG) , FRANCE - Terminal 2E Heure Limite d'Enregister un supplément.

With the following regex

(?:Votre vol|Your flight)(.*?([0-9]{1,2}\s[^\s]+?\s[0-9]{4}))+

I want to capture 31 août 2016 and 7 septembre 2016 in different groups.

If I remove the last date the first date is captured.

(Python Flavor)

Regex101 link

Upvotes: 2

Views: 113

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627022

Since there is no way to keep all the captured substrings within a group using re, you can't but follow the following 2-step process:

  • Extract the substring(s) with your current regex
  • Then, extract the dates using a subpattern like re.findall(r'\b[0-9]{1,2}\s+\S+\s+[0-9]{4}\b', s) (see the regex demo).

With a PyPi regex module, you could get all the necessary results using 1-pass approach since that library stores all captures per group.

A small note on your regex: [^\s]+?\s can be written as \S+\s since [^\s] matches any char other than whitespace, and +? lazy quantifier will make matching a bit slower than it could be with a greedy + (\s is the opposite shorthand character class, so \S+\s is optimal here).

Upvotes: 1

Related Questions