mrsquid
mrsquid

Reputation: 635

Python Pandas Ratio of values in group to group total for each group

I want to find the ratio of the counts of values in a group to the total values in the group while also keeping the other columns. I used a group by to transform my matrix into one similar to the example below. I grouped by the injury time and then the incident type to find the count of each incident per month.

Instead of count though, I want it to be the count/total count of incident for the month.

For example if there is a data frame that looks like this.

 Injury_Time       Incident_Type     Count
 2017-01           Slip              4
 2017-01           Concussion        12 
 2017-01           Struck by         19
 2017-01           Exposure          5
 2017-02           Slip              28
 2017-02           Concussion        10
 2017-02           Struck by         2
 2017-02           Exposure          10
 ...               ...               ...

Instead I want the data frame to look like this.

 Injury_Time       Incident_Type     Count
 2017-01           Slip              0.1
 2017-01           Concussion        0.3 
 2017-01           Struck by         0.475
 2017-01           Exposure          0.125
 2017-02           Slip              0.56
 2017-02           Concussion        0.2
 2017-02           Struck by         0.04
 2017-02           Exposure          0.2
 ...               ...               ...

For example for the first Slip incident on 2017-01. It would be calculated as 4/40 = 0.1 since the group total is (4 + 12 + 19 + 5 = 40). For the second group value of slip it would be 28/50 since (28 + 10 + 2 + 10 = 50), so the first value is 28/50 = 0.56. The same will be done for each value in each group as well.

Is there a good method of doing this for each group in the data frame?

Here is the code for creating the example data frame.

df = pd.DataFrame([["2017-01", "Slip", 4], ["2017-01", "Concussion", 12], ["2017-01", "Struck by", 19], ["2017-01", "Exposure", 5], ["2017-02", "Slip", 28], ["2017-02", "Concussion", 10], ["2017-02", "Struck by", 2], ["2017-02", "Exposure", 10]], columns=["Injury_Time", "Incident_Type", "Count"])

Please let me know if you have any questions.

Thank you for your help.

Upvotes: 1

Views: 582

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375675

You can use transform here:

In [11]: df.groupby("Injury_Time")["Count"].transform("sum")
Out[11]:
0    40
1    40
2    40
3    40
4    50
5    50
6    50
7    50
Name: Count, dtype: int64

In [12]: df["Count"] / df.groupby("Injury_Time")["Count"].transform("sum")
Out[12]:
0    0.100
1    0.300
2    0.475
3    0.125
4    0.560
5    0.200
6    0.040
7    0.200
Name: Count, dtype: float64

See split-apply-combine section of the docs.

Upvotes: 2

Related Questions