Olivia Brown
Olivia Brown

Reputation: 632

How to write regular expression to extract years

How can we write regular expression to extract years in texts, years may come in the following forms

Case 1:
1970 - 1980 --> 1970, 1980
January 1920 - Feb 1930 --> 1920, 1930
May 1920 to September 1930 --> 1920, 1930
Case 2:
July 1945 --> 1945

Writing regular expression for Case 1 is easy but how can I tackle Case 2 along with it

\d{4} \s? (?: [^a-zA-Z0-9] | to) \s? \w+? \d{4}

Upvotes: 0

Views: 116

Answers (2)

Brett7533
Brett7533

Reputation: 342

for your requirement, just match all 4 digit numbers

import re
s = '''1970 - 1980
January 1920 - Feb 1930
May 1920 to September 1930
July 1945'''

p = re.compile(r'\b\d{4}\b')

s = s.splitlines()
for x in s:
    result = p.findall(x) 
    print(result)

output

['1970', '1980']
['1920', '1930']
['1920', '1930']
['1945']

Upvotes: 2

Srdjan M.
Srdjan M.

Reputation: 3405

Regex: .*?([0-9]{4})(?:.*?([0-9]{4}))? or .*?(\d{4})(?:.*?(\d{4}))?

Details:

  • () Capturing group
  • (?:) Non capturing group
  • {n} Matches exactly n times
  • .*? Matches any char between zero and unlimited times (lazy)

Python code:

def Years(text):
        return re.findall(r'.*?([0-9]{4})(?:.*?([0-9]{4}))?', text)

print(Years('January 1920 - Feb 1930'))

Output:

[('1920', '1930')]

Upvotes: 0

Related Questions