Reputation: 1
I have a data in excel of employees and no. of hours worked in a week. I tagged each employee to a project he/she is working on. I can get sum of hours worked in each project by doing groupby as below:
util_breakup_sum = df[["Tag", "Bill. Hours"]].groupby("Tag").sum()
Bill. Hours
Tag
A61H 92.00
A63B 139.75
An 27.00
B32B 33.50
H 37.00
Manager 8.00
PP 23.00
RP0117 38.50
Se 37.50
However, when I try to calculate average time spent on each project per person, it gives me (sum/ total number of entries by employee), whereas correct average should be (sum / unique employee in group). Example of mean is given below:
util_breakup_mean = df[["Tag", "Bill. Hours"]].groupby("Tag").mean()
Bill. Hours
Tag
A61H 2.243902
A63B 1.486702
An 1.000000
B32B 0.712766
H 2.055556
Manager 0.296296
PP 1.095238
RP0117 1.425926
Se 3.750000
For example, group A61H has just two employees, so there average should be (92/2) = 46. However, the code is dividing by total number of entries by these employees and hence giving an average of 2.24.
How to get the average from unique employee names in the group?
Upvotes: 0
Views: 487
Reputation: 13387
Try:
df.groupby("Tag")["Bill. Hours"].sum().div(df.groupby("Tag")["Employee"].nunique()
Where Employee
is column identifying employees.
Upvotes: 3
Reputation: 323266
You can try nunique
util_breakup_mean = util_breakup_sum/df.groupby("Tag")['employee'].nunique()
Upvotes: 1