Reputation: 57
I'm working with census data. The Data-set looks like below: It's a collection of Households (or families) which is identified with an Id
and every member of a household has an Id.in.HH
. Each household has a head. Every entity in household has a relation
to household head [1
means head, 2
means head's spouse, 3
means head's child]. For example the family with Id == 1
has 4 memebers with Id.in.HH
ranging from 1 to 4. This Household has a head [Id == 1 & Id.in.HH == 3
] which is clear from it's Relation.to.Head
value [Relation.to.Head == 1
]. and so on. Gender
has two values [1: male
, 2: female
]. Data-set comes from a country that polygamy is allowed for males. I want to add a column in which family Type
is calculated. If it's a family with a male Head and one wife [Type := 1
], If it's a family with a female head and one husband [Type:= 2
], and if it's a family with a male head with several wives [Type:= 3
]. For every kid, her/his mother's Id.In.HH
is noted. I want to add a column consisting her/his mother's birth date. I'm working with data.table
but an amateur.
Id Id.in.HH Relation.to.Head Gender Mother.Id.In.HH Birth
1 1 2 1 NA 1950
1 2 3 2 3 1975
1 4 3 2 3 1980
1 3 1 2 NA 1955
2 2 1 1 NA 1943
2 3 2 2 NA 1945
2 1 2 2 NA 1960
2 5 3 1 3 1964
2 4 3 2 1 1980
3 2 1 1 NA 1975
3 3 2 2 NA 1977
3 1 3 1 3 1994
What I want it to look like is:
Id Id.in.HH Relation.to.Head Gender Mother.Id.In.HH Birth Type Mom.Birth
1 1 2 1 NA 1950 2 NA
1 2 3 2 3 1975 2 1955
1 4 3 2 3 1980 2 1955
1 3 1 2 NA 1955 2 NA
2 2 1 1 NA 1943 3 NA
2 3 2 2 NA 1945 3 NA
2 1 2 2 NA 1960 3 NA
2 5 3 1 3 1964 3 1945
2 4 3 2 1 1980 3 1960
3 2 1 1 NA 1975 1 NA
3 3 2 2 NA 1977 1 NA
3 1 3 1 3 1994 1 1977
I have tried @jlhoward solution in this way:
familyType <- function(relation, gender) {
polygamy = sum(relation == 2, na.rm = TRUE)
headgender = gender[relation == 1]
polygamy = as.integer(levels(polygamy))[polygamy]
headgender = as.integer(levels(headgender))[headgender]
if(polygamy == 0) {
if(headgender == 1){
return (1L)
} else {
return (2L)
}
} else if(polygamy == 1){
if(headgender == 1){
return (3L)
} else {
return (4L)
}
} else if(polygamy > 1) {
return (5L)
}
}
tbl[, type:=familyType(Relation.to.Head, Gender), by=Id]
But I got this error:
Error in if (polygamy == 0) { : missing value where TRUE/FALSE needed
Upvotes: 0
Views: 122
Reputation: 4024
library(dplyr)
family.head_gender =
person %>%
filter(Relation.to.Head == 1) %>%
select(Id, head_gender = Gender)
family.type =
person %>%
filter(Relation.to.Head == 2) %>%
group_by(Id) %>%
summarize(number_of_spouses = n() ) %>%
left_join(family.head_gender) %>%
mutate(type =
ifelse(head_gender == 1,
ifelse(number_of_spouses = 1,
1,
3),
2)) %>%
select(-head_gender, -number_of_spouses)
person.final =
person %>%
left_join(
person %>%
select(Mother.Id.In.HH = Id.In.HH,
Mom.Birth = Birth) ) %>%
left_join(family.type)
Upvotes: -1
Reputation: 59345
Really two questions. Here's the first part.
get.type <- function(relation,gender) {
if (sum(relation==2)>1) return(3L)
gender[relation==1]
}
DT[,Type:=get.type(Relation.to.Head,Gender), by=Id]
Using @PierreLafortune's excellent approach to the second part:
DT[, Mom.Birth := Birth[match(Mother.Id.In.HH, Id.in.HH)], by=Id]
DT
# Id Id.in.HH Relation.to.Head Gender Mother.Id.In.HH Birth Type Mom.Birth
# 1: 1 1 2 1 NA 1950 2 NA
# 2: 1 2 3 2 3 1975 2 1955
# 3: 1 4 3 2 3 1980 2 1955
# 4: 1 3 1 2 NA 1955 2 NA
# 5: 2 2 1 1 NA 1943 3 NA
# 6: 2 3 2 2 NA 1945 3 NA
# 7: 2 1 2 2 NA 1960 3 NA
# 8: 2 5 3 1 3 1964 3 1945
# 9: 2 4 3 2 1 1980 3 1960
# 10: 3 2 1 1 NA 1975 1 NA
# 11: 3 3 2 2 NA 1977 1 NA
# 12: 3 1 3 1 3 1994 1 1977
Upvotes: 3