Reputation: 1991
I would like to bucket the count of Date_Time
over a 24 hour clock but also sum the associated Amt
for each bucket. This is only being done for the highest volume book. The bucketting code is done for the 24 hours, just need assistance with the summed Amt
Dataframe:
import pandas as pd
import numpy as np
df_Highest_Traded_Away_Book = [
('Book', ['A', 'A','A','A','B','C','C','C']),
('Amt', ['10', '10', '10', '10', '20', '30', '30', '30']),
('Date_Time', ['2018-09-03 01:06:09', '2018-09-08 01:23:29',
'2018-09-15 02:23:29','2018-09-20 03:23:29',
'2018-09-20 00:23:29','2018-09-25 01:23:29',
'2018-09-25 02:23:29','2018-09-30 02:23:29',])
]
Get highest volume book
df_Highest_Traded_Away_Book = pd.DataFrame.from_items(df_Highest_Traded_Away_Book)
df_Highest_Traded_Away_Book['Date_Time'] = pd.to_datetime(df_Highest_Traded_Away_Book['Date_Time'])
df_Highest_Traded_Away_Book['Time_in_GMT'] = df_Highest_Traded_Away_Book['Date_Time'].dt.hour
print(df_Highest_Traded_Away_Book)
df_Highest_Book = df_Highest_Traded_Away_Book.groupby(['Book']).size().idxmax()
print(df_Highest_Book)
Bucket times over the 24 hour period:
df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book['Book']
.eq(df_Highest_Book)
.groupby(df_Highest_Traded_Away_Book['Time_in_GMT'])
.sum()
.astype(int)
.reindex(np.arange(25), fill_value=0)
.to_frame(df_Highest_Book))
print(df_Highest_Traded_Away_Book )
A
Time_in_GMT
0 0
1 2
2 1
3 1
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
Output required:
A
Time_in_GMT Sum Amt
0 0 0
1 2 20
2 1 10
3 1 10
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 0 0
21 0 0
22 0 0
23 0 0
24 0 0
Upvotes: 1
Views: 74
Reputation: 863301
First filter by boolean indexing
only df_Highest_Book
values and then aggregate by agg
with size
and sum
:
#convert column to integers
df_Highest_Traded_Away_Book['Amt'] = df_Highest_Traded_Away_Book['Amt'].astype(int)
df_Highest_Traded_Away_Book = (df_Highest_Traded_Away_Book[df_Highest_Traded_Away_Book['Book']
.eq(df_Highest_Book)]
.groupby('Time_in_GMT')
.agg({'Time_in_GMT':'size','Amt':'sum'})
.reindex(np.arange(25), fill_value=0)
.rename(columns={'Time_in_GMT':'Count','Amt':'Sum Amt'})
)
print(df_Highest_Traded_Away_Book)
Count Sum Amt
Time_in_GMT
0 0 0
1 2 20
2 1 10
3 1 10
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
13 0 0
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 0
20 0 0
21 0 0
22 0 0
23 0 0
24 0 0
Upvotes: 1