Reputation: 118
My dataframe looks something like this (but with about 100,000 rows of data):
ID,Total,TotalDate,DaysBtwRead,Type,YearlyAvg
1,1250,6/2/2017,17,AT267,229
2,1670,2/3/2012,320,PQ43,50
I'm trying to groupby yearly average totals using
df.groupby(pd.cut(df['YearlyAvg'], np.arange(0,1250,50))).count()
so that I can set-up for a unique monte carlo distribution, but I need these grouped by each individual Type as well. This currently only counts each range regardless of any other values.
Rather than having an overall aggregate count, I'm trying to set my code up so that the output looks more like the following (with YearlyAvg containing a count of each range)
Index,YearlyAvg
AT267(0, 50], 200
PQ43(0, 50], 123
AT267(50, 100], 49
PQ43(50, 100], 67
Is there an easier way to do this outside of creating a separate dataframe for each Type value?
Upvotes: 1
Views: 728
Reputation: 323226
You can using unstack
with stack
df['bins']=pd.cut(df['YearlyAvg'], np.arange(0,1250,50))
df.groupby(['Type','bins']).size().unstack(fill_value=0).stack()# also here will create the multiple index for achieve what you need
Out[1783]:
Type bins
AT267 (0, 50] 0
(200, 250] 1
PQ43 (0, 50] 1
(200, 250] 0
dtype: int64
Upvotes: 2