Reputation: 87
I have a dataframe with Yes/no answer in the column 'quality','price','time'.
I transformed in 1 and 0 and grouped
grouped = df.group.by(['country'])[['quality','price','time']].sum()
to get only the 'Yes' answers and the result is:
country | quality | price | time |
---|---|---|---|
FRANCE | 5 | 4 | 3 |
GERMANY | 3 | 2 | 6 |
UK | 2 | 1 | 4 |
I would like to know how to divide each values in groupby for the size(N total) of each country respondents, in my case FRANCE = 9 , GERMANY = 11, UK = 12.
I know that i can select single group and make operations with it:
france = country_split.loc[['FRANCE']]
(france/9)*100
but it is possible to make operations for the entire group in one time?
Upvotes: 1
Views: 681
Reputation: 862901
Use Series.value_counts
for counts and divide values of columns after aggregate sum
, then multiple by 100
:
#if need dict for counts
#s = {'FRANCE': 9, 'GERMANY': 11, 'UK': 12}
s = df['country'].value_counts()
grouped = df.groupby(['country'])[['quality','price','time']].sum().div(s, axis=0).mul(100)
print (grouped)
quality price time
country
FRANCE 55.555556 44.444444 33.333333
GERMANY 27.272727 18.181818 54.545455
UK 16.666667 8.333333 33.333333
Not tested, but possible solution should be aggregate mean
instead sum
:
grouped = df.groupby(['country'])[['quality','price','time']].mean()
Upvotes: 1