Rinat Devyatyarov
Rinat Devyatyarov

Reputation: 25

How to calculate the median time in pandas?

I have a dataframe in which I need to calculate the median of time in the format hh: mm: ss for each user

enter image description here

my code

file['date'] = pd.to_datetime(file['date'])
file['diff'] = file.groupby(['operation', 'user'])['date'].diff().fillna(pd.Timedelta(0)) #calculate the time difference between operations
file = file.loc[file['diff'] <= '00:00:20']  # we leave those where the difference is not more than 20 seconds
results = file.groupby(['operation', 'user'])['diff'].median() #caclulate median

python throws an error - No numeric types to aggregate...

Upvotes: 1

Views: 427

Answers (1)

jezrael
jezrael

Reputation: 863301

Use Series.dt.total_seconds for convert tiemdeltas to seconds, replace NaN by 0, compare by 20:

file['date'] = pd.to_datetime(file['date'])

#calculate the time difference between operations
file['diff']=file.groupby(['operation','user'])['date'].diff().dt.total_seconds().fillna(0)

file= file.loc[file['diff'] <= 20] #remove difference less than 20 seconds

results = file.groupby(['operation', 'user'])['diff'].median()


result_of_operation = results.groupby('operation')['diff'].median()

If need timedeltas add to_timedelta:

result_of_operation = pd.to_timedelta(result_of_operation, unit='s')

Last if need DataFrame:

result_of_operation = result_of_operation.reset_index()

Upvotes: 1

Related Questions