Manoj G
Manoj G

Reputation: 1806

IllegalMonthError in Python datefinder

I am trying to extract dates from email texts using datefinder python library.

Below is a the code snippet of what I am trying to do.

import datefinder

#body has list of email texts

email_dates=[]
for b in body: 
    dates = datefinder.find_dates(b)
    date = []
    for d in dates:
        date.append(d)

    email_dates.append(date)

datefinder tries to construct all the numbers in the email to dates. I get lot of false positives. I can remove those using some logic. But i get IllegalMonthError in some email and i am unable to go past the error and retrieve dates from other emails. Below is the error

---------------------------------------------------------------------------
IllegalMonthError                         Traceback (most recent call last)
c:\python\python38\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    654         try:
--> 655             ret = self._build_naive(res, default)
    656         except ValueError as e:

c:\python\python38\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default)
   1237 
-> 1238             if cday > monthrange(cyear, cmonth)[1]:
   1239                 repl['day'] = monthrange(cyear, cmonth)[1]

c:\python\python38\lib\calendar.py in monthrange(year, month)
    123     if not 1 <= month <= 12:
--> 124         raise IllegalMonthError(month)
    125     day1 = weekday(year, month, 1)

IllegalMonthError: bad month number 42; must be 1-12

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-39-1fbacc8ca3f6> in <module>
      7     dates = datefinder.find_dates(b)
      8     date = []
----> 9     for d in dates:
     10         date.append(d)
     11 

c:\python\python38\lib\site-packages\datefinder\__init__.py in find_dates(self, text, source, index, strict)
     30         ):
     31 
---> 32             as_dt = self.parse_date_string(date_string, captures)
     33             if as_dt is None:
     34                 ## Dateutil couldn't make heads or tails of it

c:\python\python38\lib\site-packages\datefinder\__init__.py in parse_date_string(self, date_string, captures)
    100         # otherwise self._find_and_replace method might corrupt them
    101         try:
--> 102             as_dt = parser.parse(date_string, default=self.base_date)
    103         except (ValueError, OverflowError):
    104             # replace tokens that are problematic for dateutil

c:\python\python38\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs)
   1372         return parser(parserinfo).parse(timestr, **kwargs)
   1373     else:
-> 1374         return DEFAULTPARSER.parse(timestr, **kwargs)
   1375 
   1376 

c:\python\python38\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    655             ret = self._build_naive(res, default)
    656         except ValueError as e:
--> 657             six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)
    658 
    659         if not ignoretz:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Suppose if i am getting this error in the 5th email, I will not be able to retrieve dates from 5th onwards. How to bypass this error, remove the entries causing this error and retrieve all other dates?

Thanks in Advance

Upvotes: 0

Views: 1159

Answers (2)

Pietro
Pietro

Reputation: 1110

Use a try/except block:

try:
    datefinder.find_dates(b)
except IllegalMonthError as e:
    # this will print the error, but will not stop the program
    print(e)
except Exception as e:
    # any other unexpected error will be propagated
    raise e

Update from the edits:

Notice that the traceback shows

----> 9     for d in dates:

that the exeption is raised here. Indeed, checking the documentations for find_dates, you see that find_dates returns a generator:

Returns a generator that produces datetime.datetime objects, or a tuple with the source text and index, if requested

So the actual parsing of the date is not done when you call find_dates, but when you iterate over the results. This makes it trickier to wrap in a try/catch, as you have to iterate over the generator item by item, each in a separate try/catch block:

from datefinder import find_dates

string_with_dates = """
...
entries are due by January 4th, 2017 at 8:00pm
...
created 01/15/2005 by ACME Inc. and associates.
...
Liverpool NY 13088 42 cases
"""

matches = find_dates(string_with_dates)
print(type(matches))  # <class 'generator'>

while True:

    try:
        m = next(matches)

    # this is the exception seen by the program, rather than IllegalMonthError
    except TypeError as e:
        print(f"TypeError {e}")
        continue

    # the generator has no more items
    except StopIteration as e:
        print(f"StopIteration {e}")
        break

    # any other unexpected error will be propagated
    except Exception as e:
        raise e

    print(f"m {m}")

You can do with m whatever you need.

Cheers!

Upvotes: 2

Akhil Kashyap
Akhil Kashyap

Reputation: 36

You could add an if statement checking whether the given month is within the range of 1-12, and if it is, only then append to date.

Upvotes: 0

Related Questions