Reputation: 1426
My time.csv
is like:
end_date start_date
2017-01-01 17:00:00 2017-01-01 16:30:00
2017-01-03 17:05:00 2016-01-03 21:05:00
I want to add another column duration
that contains the difference in hours
. Here is what I have so far:
import pandas as pd
from datetime import datetime, timedelta
df_time = pd.read_csv('time.csv')
df_time["duration"] = (datetime.strptime(df_time["end_date"], '%Y-%m-%d %H:%M:%S') - \
datetime.strptime(df_time["start_date"], '%Y-%m-%d %H:%M:%S'))/ \
timedelta(hours = 1)
print(df_time["duration"].head())
But I got the following error
TypeError: strptime() argument 1 must be str, not Series
How do I convert Series to str
so that the parse function works?
Secondly, how do I truncate the top 1% longest of duration?
Upvotes: 1
Views: 39
Reputation: 4370
As @Quang Hoang said, you can convert the time series column from string
into Timestamp
format and then it will be easy to find the duration between them.
import pandas as pd
time_data = pd.read_csv("time.csv")
time_data.loc[: , 'end_date'] = pd.to_datetime(time_data.loc[: , 'end_date'])
time_data.loc[: , 'start_date'] = pd.to_datetime(time_data.loc[: , 'start_date'])
time_data['duration'] = time_data['end_date'] - time_data['start_date']
Here is the screenshot of the output:
Hope it helps :)
Upvotes: 2