Reputation: 571
My .csv
looks like this:
date time
0 20190101 181555700
1 20190101 181545515
where the format is YYYYMMDD
for date
and HHMMSSMMM
for time
(last MMM are milliseconds). For example the first row would be 2019-01-01 18:15:55.700
Is there a way to parse this directly from pd.read_csv()
without having to convert it later? Using only parse_dates
does not work as it doesn't recognize the format. What I would like is to have a single column in my dataframe, with the timestamp correctly parsed like
timestamp
0 2019-01-01 18:15:55.700
Upvotes: 3
Views: 3195
Reputation: 313
I think this is close to what you need:
import pandas as pd
import datetime as dt
data = pd.read_csv(
'./a.csv',
delimiter='\t',
index_col=0,
parse_dates=[1],
converters={'time': lambda t: dt.datetime.strptime(t, '%H%M%S%f').time()}
)
Output:
date time
0 2019-01-01 18:15:55.700000
1 2019-01-01 18:15:45.515000
After some survey I found this:
data = pd.read_csv(
'./a.csv',
delimiter='\t',
index_col=1,
parse_dates={'datetime': [1, 2]},
converters={'time': lambda t: dt.datetime.strptime(t, '%H%M%S%f').time()}
)
And the output is:
datetime
0 2019-01-01 18:15:55.700
1 2019-01-01 18:15:45.515
Upvotes: 1
Reputation: 150785
You can use to_timedelta
with unit
option to turn your time
into timedelta
and add to date
:
df = pd.read_csv('file.csv', parse_dates=['date'])
df['date'] = df.date + pd.to_timedelta(df.time, unit='ms')
or:
df = pd.read_csv('file.csv')
df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.time, unit='ms')
Output:
date time
0 2019-01-03 02:25:55.700 181555700
1 2019-01-03 02:25:45.515 181545515
Update per comment:
df['date'] = pd.to_datetime(df.date.astype(str)+df.time.astype(str), format='%Y%m%d%H%M%S%f')
Output:
date time
0 2019-01-01 18:15:55.700 181555700
1 2019-01-01 18:15:45.515 181545515
Upvotes: 5