user2604504
user2604504

Reputation: 717

Merge multiple regular expressions for date recognition

I am writing a python 2.7.6 program that finds all the instances of dates in a input file i.e. (if a file contains "April 9th, 2014" "Tuesday" "02/14/1980" "Christmas" it will find all these in a file). Since dates can be represented in many different ways I have separate regular expressions for the different types of dates. I want to merge all my separate regular expressions into 1 big regular expression so it finds each "type" of date in the order they appear in the file.

I have the following code to test for dates like "April 9th, 2014"

matches = re.findall("(?:((?:jan(?:(?:.)?|(?:uary)?)|feb(?:(?:.)?|(?:ruary)?)|mar(?:(?:.)?|(?:ch)?)|apr(?:(?:.)?|(?:il)?)|may|jun(?:(?:.)?|(?:e)?)|jul(?:(?:.)?|(?:y)?)|aug(?:(?:.)?|(?:gust)?)|sep(?:(?:.)?|(?:ept(?:(?:.)?))?|(?:tember)?)|oct(?:(?:.)?|(?:ober)?)|nov(?:(?:.)?|(?:ember)?)|dec(?:(?:.)?|(?:ember)?)) (?:[123][0-9]|[1-9])[ \t\r\f\v]?(?:rd|st|th)?(?:,)?[ \t\r\f\v]?(?:[0-2][0-9][0-9][0-9])?)|(?:(?:[0]?[1-9])|(?:[1][0-2]))[-/](?:(?:[012]?[0-9])|(?:[3][01]))[/-][12][0-9][0-9][0-9])",fileText,re.IGNORECASE)
print matches

On the next line I match for dates similar to 02/14//1980 like this

matches = re.findall("(?:(?:[0]?[1-9])|(?:[1][0-2]))[-/](?:(?:[012]?[0-9])|(?:[3][01]))[/-][12][0-9][0-9][0-9]",fileText, re.IGNORECASE)
print matches

I want to merge them into one regular expression. I tried doing

matches = re.findall("(?:first regular expression|second regular expression)", textFile, re.IGNORECASE)
print matches

But this just printed all dates like "April 9th, 2014" (which was what the first regular expression was for) and '' for all dates that look like "02/14/1980" (which was what the second regular expression.

Any help would be greatly appreciated in helping me figure out how to make 2 regex's into 1.

Upvotes: 1

Views: 175

Answers (1)

Cam
Cam

Reputation: 478

What about just checking each input line against each regex?

for line in input_file:
    regex1 = re.findall(pattern,line)
    regex2 = re.findall(pattern,line)
    if len(regex1) > 0:
        for item in regex1:
            print(item)
    if len(regex2) > 0:
        for item in regex2:
            print(item)

Upvotes: 1

Related Questions