Reputation: 31
I have strings that look something like this:
"Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."
I attempted to use the parser from dateutil:
from dateutil.parser import parse
s = "Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."
dt = parse(s, fuzzy=True)
print(dt)
However, I'm getting the following error:
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 1374, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V.
How can I extract the time and the date from this string? Is there a regex that'll allow me to do so in a single line?
Edit: Ideally, I'm looking for a solution I can easily apply to an entire column in a pandas dataframe.
Upvotes: 3
Views: 136
Reputation: 153460
Use re, for regex:
from dateutil.parser import parse
import re
s = "Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."
t = re.search(' (\d{2}:\d{2}:\d{2} \d{2}\/\d{2}\/\d{4}) ', s).group(1)
dt = parse(t, fuzzy=True)
print(dt)
Output:
2019-02-07 21:50:00
Apply to a dataframe column:
pd.to_datetime(S.str.extract(' (\d{2}:\d{2}:\d{2} \d{2}\/\d{2}\/\d{4}) ').squeeze(), format='%H:%M:%S %m/%d/%Y')
Upvotes: 2