Reputation:
I have a df with information how many transfers have been made in the next 10 minutes. I would like to show which banks are the most popular at specific hours (I conclude that I will be able to show this thanks to the median). My pivot is like that:
bank_name bank1 bank2 bank3 bank4
date
2019-11-03 102 105 78 81
00:00
2019-11-03 108 100 103 77
00:10
2019-11-03
00:20 108 134 55 27
... ... ... ... ...
2019-12-22
15:30 461 312 312 253
2019-12-22
15:40 396 361 376 229
Or regular df
date bank_name transfers
2019-11-03 00:00 bank1 102
2019-11-03 00:00 bank2 105
2019-11-03 00:00 bank3 78
2019-11-03 00:00 bank4 81
2019-11-03 00:10 bank1 108
2019-11-03 00:10 bank2 100
... ... ...
My expected out (I entered the median values at random)
hour bank_name median
00 bank2 641
01 bank2 711
02 bank1 668
... ... ...
23 bank3 757
At the beginning I I would like to sum the values from 2019-11-03 00:00, 00:10, 00:20, 00:30, 00:40, 00:50
and take it as a value 03 00
. I did it like this:
df['date_'] = pd.to_datetime(df['date'].dt.strftime('%d %H'))
df = df.set_index('bank_name').groupby([ 'bank_name', 'date_']).agg({'transfers':np.sum})
... but I don't know what to do next. I will be grateful for your help.
Upvotes: 0
Views: 479
Reputation: 93191
Try this:
# median hourly transfer
hourly_transfers = df.groupby([pd.Grouper(key='date', freq='H'), 'bank_name']).median()
# which bank has the highest median in each hour
idx = hourly_transfers.groupby('date')['transfers'].idxmax()
# the result
hourly_transfers.loc[idx]
Upvotes: 1
Reputation: 854
Here's how I would do it.
import pandas as pd
df.groupby([pd.to_datetime(df['date']).dt.hour, 'bank_name'])['transfers'].median()
Upvotes: 1
Reputation: 490
So it sounds like you need to:
Something like this:
import datetime
df['date_hourly'] = df['date'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour)) // [From here][1]
df_hourly = df.groupby(['date_hourly', 'bank_name']).transfers.sum().reset_index()
df_hourly['hour'] = df_hourly.date_hourly.apply(lambda x: x.hour)
df_pivot = df_hourly.groupby(['hour', 'bank_name']).transfers.median().reset_index()
Upvotes: 0