match multiple OR conditions in python 3 regex findall

Question

In python 3:

This is the Office of Foreign Asset Control list where individuals' assets should be monitored

https://www.treasury.gov/ofac/downloads/sdn.csv

a lot of their data of births (the very last column, comma delimited) are like

DOB 23 Jun 1959; alt. DOB 23 Jun 1958

or

DOB 1959; alt. DOB 1958

I am trying to capture all the birthdates after the keyword "DOB" AND "alt. DOB" with the following codes:

   if len(x.split(';')) > 0:
        if len(re.findall('DOB (.*)', x.split(';')[0])) > 0:
            new = re.findall('DOB | alt. DOB (.*)', x.split(';')[0])[0]
            print(new)

            try:
                print(datetime.strptime(new, '%d %b %Y'))
                return datetime.strptime(new, '%d %b %Y')
            except:
                return None

But the codes only get the birthdate right after "DOB", but not include the date of birth after "alt. DOB". Wonder how could i do it? Thank you.

The fourth bird · Accepted Answer

You could match DOB and use a capturing group for the date part. For the date part, the number of days and the month can be optional followed by matching 4 digits.

The date part pattern does not validate the date itself, it makes the match a bit more specific.

\bDOB ((?:(?:3[01]|[12][0-9]|0?[1-9]) [A-Za-z]+ )?\d{4})\b

Explanation

\bDOB Match literally preceded by a word boundary
( Capture group 1
- (?: Non capture group
  - (?:3[01]|[12][0-9]|0?[1-9]) [A-Za-z]+ Match a digit 1-31 and 1+ chars A-Za-z
- )? Close group and make it optional
- \d{4} Match 4 digits
)\b Close group 1 followed by a word boundary

Regex demo | Python demo

For example:

import re

regex = r"\bDOB ((?:(?:3[01]|[12][0-9]|0?[1-9]) [A-Za-z]+ )?\d{4})\b"
test_str = ("DOB 23 Jun 1959; alt. DOB 23 Jun 1958
"
    "DOB 1959; alt. DOB 1958")

print(re.findall(regex, test_str))

Output

['23 Jun 1959', '23 Jun 1958', '1959', '1958']

match multiple OR conditions in python 3 regex findall

Answers (2)

Related Questions