Reputation: 635
I want to find the ratio of the counts of values in a group to the total values in the group while also keeping the other columns. I used a group by to transform my matrix into one similar to the example below. I grouped by the injury time and then the incident type to find the count of each incident per month.
Instead of count though, I want it to be the count/total count of incident for the month.
For example if there is a data frame that looks like this.
Injury_Time Incident_Type Count
2017-01 Slip 4
2017-01 Concussion 12
2017-01 Struck by 19
2017-01 Exposure 5
2017-02 Slip 28
2017-02 Concussion 10
2017-02 Struck by 2
2017-02 Exposure 10
... ... ...
Instead I want the data frame to look like this.
Injury_Time Incident_Type Count
2017-01 Slip 0.1
2017-01 Concussion 0.3
2017-01 Struck by 0.475
2017-01 Exposure 0.125
2017-02 Slip 0.56
2017-02 Concussion 0.2
2017-02 Struck by 0.04
2017-02 Exposure 0.2
... ... ...
For example for the first Slip incident on 2017-01. It would be calculated as 4/40 = 0.1 since the group total is (4 + 12 + 19 + 5 = 40). For the second group value of slip it would be 28/50 since (28 + 10 + 2 + 10 = 50), so the first value is 28/50 = 0.56. The same will be done for each value in each group as well.
Is there a good method of doing this for each group in the data frame?
Here is the code for creating the example data frame.
df = pd.DataFrame([["2017-01", "Slip", 4], ["2017-01", "Concussion", 12], ["2017-01", "Struck by", 19], ["2017-01", "Exposure", 5], ["2017-02", "Slip", 28], ["2017-02", "Concussion", 10], ["2017-02", "Struck by", 2], ["2017-02", "Exposure", 10]], columns=["Injury_Time", "Incident_Type", "Count"])
Please let me know if you have any questions.
Thank you for your help.
Upvotes: 1
Views: 582
Reputation: 375675
You can use transform here:
In [11]: df.groupby("Injury_Time")["Count"].transform("sum")
Out[11]:
0 40
1 40
2 40
3 40
4 50
5 50
6 50
7 50
Name: Count, dtype: int64
In [12]: df["Count"] / df.groupby("Injury_Time")["Count"].transform("sum")
Out[12]:
0 0.100
1 0.300
2 0.475
3 0.125
4 0.560
5 0.200
6 0.040
7 0.200
Name: Count, dtype: float64
See split-apply-combine section of the docs.
Upvotes: 2