Konkster
Konkster

Reputation: 11

Group by and then add a column for ratio based on condition

Say my dataframe in R looks like the one below. Sex is male/female. Familysize is the number of family members with the same surname. Surname is the surname.

Sex         FamilySize  Surname  
male        1           Abbing  
female      3           Abbott  
male        3           Abbott  
male        3           Abbott  
male        1           Abelseth  
female      1           Abelseth  
male        2           Abelson  
female      2           Abelson  
male        1           Abrahamsson  
female      1           Abrahim 

I want to add a new column FemaleToFamilySizeRatio, that will give me the ratio of the number of Females in each family. The results would look like below:

Sex         FamilySize  Surname     Ratio  
male        1           Abbing      0  
female      3           Abbott      0.33  
male        3           Abbott      0.33  
male        3           Abbott      0.33  
male        1           Abelseth    0.5  
female      1           Abelseth    0.5  
male        2           Abelson     0.5  
female      2           Abelson     0.5  
male        1           Abrahamsson 0  
female      1           Abrahim     0  

I played around with table, aggregate and the most promising one is ddply. I have reached a point where some direction would be helpful, because if I keep going my code will only get long and ugly.

Upvotes: 0

Views: 1991

Answers (3)

Gregor Thomas
Gregor Thomas

Reputation: 145765

Using dplyr

library(dplyr)
table_family %>%
    group_by(Surname) %>%
    mutate(Ratio = sum(Sex == "female") / FamilySize)

Upvotes: 2

jaimedash
jaimedash

Reputation: 2743

Here's a solution with the base R functions aggregate and merge

File dat.csv:

 Sex,FamilySize,Surname
 male,1,Abbing
 female,3,Abbott
 male,3,Abbott
 male,3,Abbott
 male,1,Abelseth
 female,1,Abelseth
 male,2,Abelson
 female,2,Abelson
 male,1,Abrahamsson
 female,1,Abrahim

Code

 d <-  read.csv('dat.csv')

 num_fem <- aggregate(Sex ~ ., dat=d, function(x) length(which(x == 'female')))
 d_rat <- with(num_fem, data.frame(Ratio=Sex / FamilySize, Surname=Surname))

 merge(d, d_rat)

 #       Surname    Sex FamilySize     Ratio
 #1       Abbing   male          1 0.0000000
 #2       Abbott female          3 0.3333333
 #3       Abbott   male          3 0.3333333
 #4       Abbott   male          3 0.3333333
 #5     Abelseth   male          1 1.0000000
 #6     Abelseth female          1 1.0000000
 #7      Abelson   male          2 0.5000000
 #8      Abelson female          2 0.5000000
 #9  Abrahamsson   male          1 0.0000000
 #10     Abrahim female          1 1.0000000

Upvotes: 2

Michele Usuelli
Michele Usuelli

Reputation: 2000

You can do that using data.table

library(data.table)
table_family <- data.table(table_input)
table_family[, Ratio := sum(Sex == "female") / FamilySize[1], by = "Surname"]

Upvotes: 3

Related Questions