Reputation: 501
I'm facing a little challenge with pandas which i'm having a hard to time figure out.
I have created two dataframes with the following code
df5 = dataFrame[['PdDistrict' , 'Category']]
df5 = df5[pd.notnull(df5['PdDistrict'])]
df5 = df5.groupby(['Category', 'PdDistrict']).size()
df5 = df5.reset_index()
df5 = df5.sort_values(['PdDistrict',0], ascending=False)
df6 = df5.groupby('PdDistrict')[0].sum()
df6 = df6.reset_index()
This gives me two dataframes. df5 contains how many times a specific category occurs in a given district. e.g.
'Category' 'PdDistrict' 'count'
Drugs Bayview 200
Theft Bayview 200
Gambling Bayview 200
Drugs CENTRAL 300
Theft CENTRAL 300
Gambling CENTRAL 300
the df6 frame contains the total number of categories for a given PdDistrict.
This gives df6 the following look
'PdDistrict' 'total count'
Bayview 600
CENTRAL 900
Now what I want is that df5 gets to look like this e.g:
'Category' 'PdDistrict' 'count' 'Average'
Drugs Bayview 200 0.33
Theft Bayview 200 0.33
Gambling Bayview 200 0.33
Drugs CENTRAL 200 0.22
Theft CENTRAL 200 0.22
Gambling CENTRAL 200 0.22
So it's basically taking count from df5 and dividing it by totalcount from df6, but for the same district. How can I do this?
res = df5.set_index('PdDistrict', append = False) / df6.set_index('PdDistrict', append = False)
The above gives me NaN in Category.
Upvotes: 1
Views: 928
Reputation: 394021
You could add the total count
col to your first df and then you can perform the calculation:
In [45]:
df['total count'] = df['PdDistrict'].map(df1.set_index('PdDistrict')['total count'])
df
Out[45]:
Category PdDistrict count total count
0 Drugs Bayview 200 600
1 Theft Bayview 200 600
2 Gambling Bayview 200 600
3 Drugs CENTRAL 300 900
4 Theft CENTRAL 300 900
5 Gambling CENTRAL 300 900
In [46]:
df['Average'] = df['count']/df['total count']
df
Out[46]:
Category PdDistrict count total count Average
0 Drugs Bayview 200 600 0.333333
1 Theft Bayview 200 600 0.333333
2 Gambling Bayview 200 600 0.333333
3 Drugs CENTRAL 300 900 0.333333
4 Theft CENTRAL 300 900 0.333333
5 Gambling CENTRAL 300 900 0.333333
Upvotes: 2