rosefun
rosefun

Reputation: 1857

How to get the average value after grouping by columns?

I have a DataFrame named df, and I want to get the average usage time of different apps by different gender groups.

import pandas as pd 
df=pd.DataFrame({'user':[2,3,4,4,5,5],'gender':[0,0,1,1,1,1],
'app':['k','k','k','k','s','s'],'time':[6,10,10,6,3,1]})

Input:

  app  gender  time  user
0   k    0     6     2
1   k    0    10     3
2   k    1    10     4
3   k    1     6     4
4   s    1     3     5
5   s    1     1     5

For app k, the total time for the gender 0 group to use app k is 16 (10 + 6 ), so the average usage time 0_k is 8.0.

And the total time for the gender 1 group to use app k is 16 (10 + 6 + 0 + 0), so the average usage time 1_k is 4.0.

Expected:

dict = {'0_k': 8.0, '0_s': 0, '1_k': 4.0, '1_s': 1.0}

Upvotes: 0

Views: 53

Answers (2)

george mano
george mano

Reputation: 6168

(df.groupby(["app", "gender"]).sum()/df.groupby(["gender"]).count()).time


app  gender
k    0         8.0
     1         4.0
s    1         1.0

To trasform it to a dictionary:

dict = (df.groupby(["app", "gender"]).sum()/df.groupby(["gender"]).count()).time.to_dict()

{('k', 0): 8.0, ('s', 1): 2.0, ('k', 1): 8.0}

Upvotes: 0

Space Impact
Space Impact

Reputation: 13255

IIUC I think you need:

df['new_col'] = df.gender.astype(str)+'_'+df.app
df['Average'] = df.groupby(['gender','app'])['time'].transform('sum')/\
                df.groupby(['gender'])['time'].transform('count')

print(df)
   user  gender app  time new_col  Average
0     2       0   k     6     0_k      8.0
1     3       0   k    10     0_k      8.0
2     4       1   k    10     1_k      4.0
3     4       1   k     6     1_k      4.0
4     5       1   s     3     1_s      1.0
5     5       1   s     1     1_s      1.0

d = dict(df[['new_col','Average']].values)

print(d)
{'0_k': 8.0, '1_k': 4.0, '1_s': 1.0}

Upvotes: 2

Related Questions