Python: get time difference between 2 time columns of dataframe and filter them

Question

My time.csv is like:

end_date               start_date
2017-01-01 17:00:00    2017-01-01 16:30:00
2017-01-03 17:05:00    2016-01-03 21:05:00

I want to add another column duration that contains the difference in hours. Here is what I have so far:

import pandas as pd
from datetime import datetime, timedelta

df_time = pd.read_csv('time.csv')

df_time["duration"] = (datetime.strptime(df_time["end_date"], '%Y-%m-%d %H:%M:%S') - \
                       datetime.strptime(df_time["start_date"], '%Y-%m-%d %H:%M:%S'))/ \
                       timedelta(hours = 1)

print(df_time["duration"].head())

But I got the following error

TypeError: strptime() argument 1 must be str, not Series

How do I convert Series to str so that the parse function works? Secondly, how do I truncate the top 1% longest of duration?

Arun AK · Accepted Answer

As @Quang Hoang said, you can convert the time series column from string into Timestamp format and then it will be easy to find the duration between them.

import pandas as pd

time_data = pd.read_csv("time.csv")
time_data.loc[: , 'end_date'] = pd.to_datetime(time_data.loc[: , 'end_date'])
time_data.loc[: , 'start_date'] = pd.to_datetime(time_data.loc[: , 'start_date'])

time_data['duration'] = time_data['end_date'] - time_data['start_date']

Here is the screenshot of the output:

Hope it helps :)

Python: get time difference between 2 time columns of dataframe and filter them

Answers (1)

Related Questions