Reputation: 481
I have the data frame :
PRODUCT SPEED HEIGHT LENGTH DATE
30 10 5 8 2019-08
30 13 9 15 2019-08
31 19 8 12 2019-08
30 5 6 3 2019-08
31 11 8 6 2019-09
30 11 8 6 2019-09
30 11 8 6 2019-09
31 11 8 6 2019-09
31 11 8 6 2019-09
I want to produce a table that has the averages for SPEED, HEIGHT, LENGTH by month. I also want to have a column in that table that has the percentage of the total PRODUCT per month when PRODUCT=30.
My desired output:
DATE SPEED HEIGHT LENGTH PRODUCT
2019-08 11.75 7 9.5 75%
2019-09 11 8 6 40%
I have the following code:
avg_summary = df.groupby(['DATE']).agg({'SPEED': 'mean',
'HEIGHT' : 'mean',
'LENGTH': 'mean',
'PRODUCT': 'count'}).reset_index()
This outputs:
DATE SPEED HEIGHT LENGTH PRODUCT
2019-08 11.75 7 9.5 4
2019-09 11 8 6 5
I know I can do (df[df.PRODUCT == 30].shape[0])
to find the count in the whole data frame for when PRODUCT == 30, but I cannot figure out how to count the number of instances for when PRODUCT=30 by month.
I figured I could divide the number of instances for when PRODUCT=30 by the total count of PRODUCT to get the result I am looking for.
Any help/tips would be very appreciated! Thanks.
Upvotes: 1
Views: 60
Reputation: 1167
You could also write an inline function to pass in for the product column. An f-string can be used if you want the % sign in the column:
df = (df.groupby('DATE', as_index=False)
.agg({'SPEED': 'mean',
'HEIGHT': 'mean',
'LENGTH': 'mean',
'PRODUCT': lambda x: f'{100*(x.eq(30).sum()/len(x)):.0f}%'})
)
print(df)
DATE SPEED HEIGHT LENGTH PRODUCT
0 2019-08 11.75 7 9.5 75%
1 2019-09 11.00 8 6.0 40%
Upvotes: 1
Reputation: 64
You could combine the two things that you were talking about and I think that this will get you what you want.
counts = df[df['PRODUCT']==30].groupby(['DATE']).agg({'PRODUCT': 'count'}).reset_index()
avg_summary['PRODUCT'] = (counts['PRODUCT'] / avg_summary['PRODUCT'])*100
First off you filter on the PRODUCT=30 and then you group by DATE, which gives you the counts by month of PRODUCT=30.
Upvotes: 1
Reputation: 42886
You were quite close, after your groupby, filter your data on PRODUCT == 30
, groupby on date again and then divide the size product by the original avg_summary
:
s = df.query('PRODUCT==30').groupby('DATE')['PRODUCT'].size().to_numpy()
avg_summary['PRODUCT'] = s / avg_summary['PRODUCT'] * 100
output
DATE SPEED HEIGHT LENGTH PRODUCT
0 2019-08 11.75 7 9.5 75.0
1 2019-09 11.00 8 6.0 40.0
Upvotes: 2