Reputation: 21
I have a huge dataset and I am trying to parse timestamp column to have the following format:
YYYY-MM-DD HH:MM:SS:DEC
but the output I have in the dataframe have something like: 210309_131046520_11
.
Hence: I want to split the output column to have the following column:
date, day, month, year, dec, hh, mm, ss
eg: 210309_131046520_11
date
21-03-09
day
09
month
03
year
2021
dec
520
hh
13
mm
10
ss
46
I tried to use datetime
modules but nothing seems to be helpful. Can someone help?
Example of my timestamp column:
0 210309_131046520_11.sgy
1 210309_131046520_05.sgy
2 210309_131046528_02.sgy
3 210309_131046528_12.sgy
4 210309_131049712_07.sgy
...
162 210309_132658584_07.sgy
163 210309_132659632_10.sgy
164 210309_132701232_05.sgy
165 210309_132704256_02.sgy
166 210309_132707136_12.sgy
Example of my date column:
0 131046520
1 131046520
2 131046528
3 131046528
4 131049712
...
162 132658584
163 132659632
164 132701232
165 132704256
166 132707136
One thing I attempted:
Using split
:
s_df["date"] = s_df["timestamp"].apply(lambda row: row.split("_")[1])
Output:
TypeError: list indices must be integers or slices, not str
Upvotes: 2
Views: 541
Reputation: 1015
You can use datetime's strptime to interpret your dates from the standard library:
from datetime import datetime as dt
dt.strptime('210309_131045620', '%y%m%d_%H%M%S%f')
Out: datetime.datetime(2021, 3, 9, 13, 10, 45, 620000)
In a pandas dataframe, this is implemented more efficiently using the built-in, which conforms to the same formatting syntax:
import pandas as pd
pd.to_datetime(df['date'], format='%y%m%d_%H%M%S%f')
For details about customizing the format string, see the strftime reference: https://strftime.org
Upvotes: 3