Reputation: 25
I have a dataframe in which I need to calculate the median of time in the format hh: mm: ss for each user
my code
file['date'] = pd.to_datetime(file['date'])
file['diff'] = file.groupby(['operation', 'user'])['date'].diff().fillna(pd.Timedelta(0)) #calculate the time difference between operations
file = file.loc[file['diff'] <= '00:00:20'] # we leave those where the difference is not more than 20 seconds
results = file.groupby(['operation', 'user'])['diff'].median() #caclulate median
python throws an error - No numeric types to aggregate...
Upvotes: 1
Views: 427
Reputation: 863301
Use Series.dt.total_seconds
for convert tiemdeltas to seconds, replace NaN
by 0
, compare by 20
:
file['date'] = pd.to_datetime(file['date'])
#calculate the time difference between operations
file['diff']=file.groupby(['operation','user'])['date'].diff().dt.total_seconds().fillna(0)
file= file.loc[file['diff'] <= 20] #remove difference less than 20 seconds
results = file.groupby(['operation', 'user'])['diff'].median()
result_of_operation = results.groupby('operation')['diff'].median()
If need timedeltas add to_timedelta
:
result_of_operation = pd.to_timedelta(result_of_operation, unit='s')
Last if need DataFrame:
result_of_operation = result_of_operation.reset_index()
Upvotes: 1