Mag_Amine
Mag_Amine

Reputation: 175

Regular Expression for month and year excluding some words

I'm trying to make a regular expression able to handle inputs like bellow to extract month and year while handling all these different cases and extract the 2 groups (start and end) like this:

From August 2017 - September 2018   (output: {August 2017},{September 2018})
From August to September 2018       (output: {August},{September 2018})
July 2009 - August 2019             (output: {July 2009},{August 2019})
De Aout 2019 a July 2020            (output: {Aout 2019},{July 2020})
De Juillet a Aout 2020              (output: {Juillet},{Aout 2020})
Juillet - Aout 2019                 (output: {Juillet},{Aout 2019})
Juillet a Aout 2019                 (output: {Juillet},{Aout 2019})

I found this regex here which does a good job (regex101 link):

(?P<fmonth>\w+.\d*)\s+\D+\s+(?P<smonth>\D+.\d+)

But the problem with it is that it does not handle these 2 cases where there is no year in the first part:

De Juillet a Aout 2020
From August to September 2018

I think it's missing a part to exclude/ignore specific words like "From" and "De".

Any ideas or solutions ?

Upvotes: 2

Views: 212

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Note that \D+ is a very generic pattern, it matches August to in From August to September 2018, i.e. any 1+ non-digit symbols. Also, \w matches letters, digits and _s, it may be more appropriate to only match letters when you need to match month names, and for that, all you need is to subtract \d and _ from it ([^\W\d_]).

You may use a bit more precise regex:

(?P<fmonth>[^\W\d_]+(?:\W+\d+)?)\s+(?:to|a|-)\s+(?P<smonth>[^\W\d_]+\W+\d+)

See the regex demo

Details

  • (?P<fmonth>[^\W\d_]+(?:\W+\d+)?) - fmonth group: 1+ letters and an optional sequence of 1+ non-word chars followed with 1+ digits
  • \s+ - 1+ whitespaces
  • (?:to|a|-) - to, a or -
  • \s+ - 1+ whitespaces
  • (?P<smonth>[^\W\d_]+\W+\d+) - smonth group: 1+ letters, 1+ non-word chars, 1+ digits

Upvotes: 2

Related Questions