hydradon
hydradon

Reputation: 1426

Python: get time difference between 2 time columns of dataframe and filter them

My time.csv is like:

end_date               start_date
2017-01-01 17:00:00    2017-01-01 16:30:00
2017-01-03 17:05:00    2016-01-03 21:05:00

I want to add another column duration that contains the difference in hours. Here is what I have so far:

import pandas as pd
from datetime import datetime, timedelta

df_time = pd.read_csv('time.csv')

df_time["duration"] = (datetime.strptime(df_time["end_date"], '%Y-%m-%d %H:%M:%S') - \
                       datetime.strptime(df_time["start_date"], '%Y-%m-%d %H:%M:%S'))/ \
                       timedelta(hours = 1)

print(df_time["duration"].head())

But I got the following error

TypeError: strptime() argument 1 must be str, not Series

How do I convert Series to str so that the parse function works? Secondly, how do I truncate the top 1% longest of duration?

Upvotes: 1

Views: 39

Answers (1)

Arun AK
Arun AK

Reputation: 4370

As @Quang Hoang said, you can convert the time series column from string into Timestamp format and then it will be easy to find the duration between them.

import pandas as pd

time_data = pd.read_csv("time.csv")
time_data.loc[: , 'end_date'] = pd.to_datetime(time_data.loc[: , 'end_date'])
time_data.loc[: , 'start_date'] = pd.to_datetime(time_data.loc[: , 'start_date'])

time_data['duration'] = time_data['end_date'] - time_data['start_date']

Here is the screenshot of the output:

enter image description here

Hope it helps :)

Upvotes: 2

Related Questions