Mayank Gera
Mayank Gera

Reputation: 1

Pandas group sum divided by unique items in group

I have a data in excel of employees and no. of hours worked in a week. I tagged each employee to a project he/she is working on. I can get sum of hours worked in each project by doing groupby as below:

util_breakup_sum = df[["Tag", "Bill. Hours"]].groupby("Tag").sum()
    Bill. Hours
Tag                 
A61H           92.00
A63B          139.75
An             27.00
B32B           33.50
H              37.00
Manager         8.00
PP             23.00
RP0117         38.50
Se             37.50

However, when I try to calculate average time spent on each project per person, it gives me (sum/ total number of entries by employee), whereas correct average should be (sum / unique employee in group). Example of mean is given below:

util_breakup_mean = df[["Tag", "Bill. Hours"]].groupby("Tag").mean()
     Bill. Hours
Tag                 
A61H        2.243902
A63B        1.486702
An          1.000000
B32B        0.712766
H           2.055556
Manager     0.296296
PP          1.095238
RP0117      1.425926
Se          3.750000  

For example, group A61H has just two employees, so there average should be (92/2) = 46. However, the code is dividing by total number of entries by these employees and hence giving an average of 2.24.

How to get the average from unique employee names in the group?

Upvotes: 0

Views: 487

Answers (2)

Georgina Skibinski
Georgina Skibinski

Reputation: 13387

Try:

df.groupby("Tag")["Bill. Hours"].sum().div(df.groupby("Tag")["Employee"].nunique()

Where Employee is column identifying employees.

Upvotes: 3

BENY
BENY

Reputation: 323266

You can try nunique

util_breakup_mean = util_breakup_sum/df.groupby("Tag")['employee'].nunique()

Upvotes: 1

Related Questions