Reputation: 11
Say my dataframe in R looks like the one below. Sex is male/female. Familysize is the number of family members with the same surname. Surname is the surname.
Sex FamilySize Surname
male 1 Abbing
female 3 Abbott
male 3 Abbott
male 3 Abbott
male 1 Abelseth
female 1 Abelseth
male 2 Abelson
female 2 Abelson
male 1 Abrahamsson
female 1 Abrahim
I want to add a new column FemaleToFamilySizeRatio, that will give me the ratio of the number of Females in each family. The results would look like below:
Sex FamilySize Surname Ratio
male 1 Abbing 0
female 3 Abbott 0.33
male 3 Abbott 0.33
male 3 Abbott 0.33
male 1 Abelseth 0.5
female 1 Abelseth 0.5
male 2 Abelson 0.5
female 2 Abelson 0.5
male 1 Abrahamsson 0
female 1 Abrahim 0
I played around with table, aggregate and the most promising one is ddply. I have reached a point where some direction would be helpful, because if I keep going my code will only get long and ugly.
Upvotes: 0
Views: 1991
Reputation: 145765
Using dplyr
library(dplyr)
table_family %>%
group_by(Surname) %>%
mutate(Ratio = sum(Sex == "female") / FamilySize)
Upvotes: 2
Reputation: 2743
Here's a solution with the base R functions aggregate and merge
File dat.csv:
Sex,FamilySize,Surname
male,1,Abbing
female,3,Abbott
male,3,Abbott
male,3,Abbott
male,1,Abelseth
female,1,Abelseth
male,2,Abelson
female,2,Abelson
male,1,Abrahamsson
female,1,Abrahim
Code
d <- read.csv('dat.csv')
num_fem <- aggregate(Sex ~ ., dat=d, function(x) length(which(x == 'female')))
d_rat <- with(num_fem, data.frame(Ratio=Sex / FamilySize, Surname=Surname))
merge(d, d_rat)
# Surname Sex FamilySize Ratio
#1 Abbing male 1 0.0000000
#2 Abbott female 3 0.3333333
#3 Abbott male 3 0.3333333
#4 Abbott male 3 0.3333333
#5 Abelseth male 1 1.0000000
#6 Abelseth female 1 1.0000000
#7 Abelson male 2 0.5000000
#8 Abelson female 2 0.5000000
#9 Abrahamsson male 1 0.0000000
#10 Abrahim female 1 1.0000000
Upvotes: 2
Reputation: 2000
You can do that using data.table
library(data.table)
table_family <- data.table(table_input)
table_family[, Ratio := sum(Sex == "female") / FamilySize[1], by = "Surname"]
Upvotes: 3