gados2000
gados2000

Reputation: 31

How to strip time and from from a non-datetime string in Python?

I have strings that look something like this:

"Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."

I attempted to use the parser from dateutil:

from dateutil.parser import parse
s = "Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."
dt = parse(s, fuzzy=True)
print(dt)

However, I'm getting the following error:

Traceback (most recent call last):
  File "<string>", line 3, in <module>
File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py", line 649, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V.

How can I extract the time and the date from this string? Is there a regex that'll allow me to do so in a single line?

Edit: Ideally, I'm looking for a solution I can easily apply to an entire column in a pandas dataframe.

Upvotes: 3

Views: 136

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

Use re, for regex:

from dateutil.parser import parse
import re

s = "Audio was recorded at 21:50:00 02/07/2019 (UTC) by device 243B1F05 at gain setting 2 while battery state was 3.6V."

t = re.search(' (\d{2}:\d{2}:\d{2} \d{2}\/\d{2}\/\d{4}) ', s).group(1)
dt = parse(t, fuzzy=True)
print(dt)

Output:

2019-02-07 21:50:00

Apply to a dataframe column:

pd.to_datetime(S.str.extract(' (\d{2}:\d{2}:\d{2} \d{2}\/\d{2}\/\d{4}) ').squeeze(), format='%H:%M:%S %m/%d/%Y')

Upvotes: 2

Related Questions