user1424739
user1424739

Reputation: 13675

How to parse this time format?

The following example shows that dateutil.parser.parse can not parse:

Tue, 27 May 2014 20:06:08 +0800 (GMT+08:00)

What python method can parse it as well as:

Thu, 16 Dec 2010 12:14:05 +0000

I tried:

$ ./main.py 
Traceback (most recent call last):
  File "./main.py", line 5, in <module>
    date = parser.parse('Tue, 27 May 2014 20:06:08 +0800 (GMT+08:00)')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dateutil/parser.py", line 1008, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dateutil/parser.py", line 395, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

$ cat ./main.py 
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import dateutil.parser as parser
date = parser.parse('Tue, 27 May 2014 20:06:08 +0800 (GMT+08:00)')

Upvotes: 2

Views: 871

Answers (2)

alecxe
alecxe

Reputation: 473873

You can go for a "fuzzy" mode which is able to parse both variants:

In [7]: parser.parse('Tue, 27 May 2014 20:06:08 +0800 (GMT+08:00)', fuzzy=True)
Out[7]: datetime.datetime(2014, 5, 27, 20, 6, 8, tzinfo=tzoffset(None, 28800))

In [8]: parser.parse('Thu, 16 Dec 2010 12:14:05 +0000', fuzzy=True)
Out[8]: datetime.datetime(2010, 12, 16, 12, 14, 5, tzinfo=tzutc())

Upvotes: 0

Stephen Rauch
Stephen Rauch

Reputation: 49794

If the extra text is on the end of the string and of unknown format, then you can trim the extra until the string is parsable like:

Code:

def parse_datetime_remove_useless_end(date_str):
    for i in range(len(date_str) + 1, 0, -1):
        try:
            return parser.parse(date_str[:i])
        except ValueError:
            pass

Test Code:

import dateutil.parser as parser

print(parse_datetime_remove_useless_end('Tue, 27 May 2014 20:06:08 +0800 (GMT+08:00)'))
print(parse_datetime_remove_useless_end('Thu, 16 Dec 2010 12:14:05 +0000'))

Results:

2014-05-27 20:06:08+08:00
2010-12-16 12:14:05+00:00

Upvotes: 1

Related Questions