Amin
Amin

Reputation: 57

Flexible subseting data.table in R

I'm working with census data. The Data-set looks like below: It's a collection of Households (or families) which is identified with an Id and every member of a household has an Id.in.HH. Each household has a head. Every entity in household has a relation to household head [1 means head, 2 means head's spouse, 3 means head's child]. For example the family with Id == 1 has 4 memebers with Id.in.HH ranging from 1 to 4. This Household has a head [Id == 1 & Id.in.HH == 3] which is clear from it's Relation.to.Head value [Relation.to.Head == 1]. and so on. Gender has two values [1: male, 2: female]. Data-set comes from a country that polygamy is allowed for males. I want to add a column in which family Type is calculated. If it's a family with a male Head and one wife [Type := 1], If it's a family with a female head and one husband [Type:= 2], and if it's a family with a male head with several wives [Type:= 3]. For every kid, her/his mother's Id.In.HH is noted. I want to add a column consisting her/his mother's birth date. I'm working with data.table but an amateur.

Id  Id.in.HH  Relation.to.Head  Gender  Mother.Id.In.HH Birth
1   1         2                 1       NA              1950
1   2         3                 2       3               1975
1   4         3                 2       3               1980
1   3         1                 2       NA              1955
2   2         1                 1       NA              1943             
2   3         2                 2       NA              1945        
2   1         2                 2       NA              1960     
2   5         3                 1       3               1964
2   4         3                 2       1               1980        
3   2         1                 1       NA              1975      
3   3         2                 2       NA              1977    
3   1         3                 1       3               1994         

What I want it to look like is:

Id  Id.in.HH  Relation.to.Head  Gender  Mother.Id.In.HH Birth  Type  Mom.Birth
1   1         2                 1       NA              1950   2     NA
1   2         3                 2       3               1975   2     1955
1   4         3                 2       3               1980   2     1955
1   3         1                 2       NA              1955   2     NA
2   2         1                 1       NA              1943   3     NA           
2   3         2                 2       NA              1945   3     NA 
2   1         2                 2       NA              1960   3     NA
2   5         3                 1       3               1964   3     1945
2   4         3                 2       1               1980   3     1960    
3   2         1                 1       NA              1975   1     NA         
3   3         2                 2       NA              1977   1     NA
3   1         3                 1       3               1994   1     1977 

I have tried @jlhoward solution in this way:

familyType <- function(relation, gender) {
  polygamy = sum(relation == 2, na.rm = TRUE)
  headgender = gender[relation == 1]
  polygamy = as.integer(levels(polygamy))[polygamy]
  headgender = as.integer(levels(headgender))[headgender]
  if(polygamy == 0) {
    if(headgender == 1){
      return (1L)
    } else {
      return (2L)
    }
  } else if(polygamy == 1){
    if(headgender == 1){
      return (3L)
    } else {
      return (4L)
    }
  } else if(polygamy > 1) {
    return (5L)
  }
}

tbl[, type:=familyType(Relation.to.Head, Gender), by=Id]

But I got this error:

Error in if (polygamy == 0) { : missing value where TRUE/FALSE needed

Upvotes: 0

Views: 122

Answers (2)

bramtayl
bramtayl

Reputation: 4024

library(dplyr)

family.head_gender =
  person %>%
  filter(Relation.to.Head == 1) %>%
  select(Id, head_gender = Gender)

family.type = 
  person %>%
  filter(Relation.to.Head == 2) %>%
  group_by(Id) %>%
  summarize(number_of_spouses = n() ) %>%
  left_join(family.head_gender) %>%
  mutate(type = 
           ifelse(head_gender == 1,
             ifelse(number_of_spouses = 1,
               1,
               3),
             2)) %>%
  select(-head_gender, -number_of_spouses)

person.final = 
  person %>%
  left_join(
    person %>%
    select(Mother.Id.In.HH = Id.In.HH,
           Mom.Birth = Birth) ) %>%
  left_join(family.type)

Upvotes: -1

jlhoward
jlhoward

Reputation: 59345

Really two questions. Here's the first part.

get.type <- function(relation,gender) {
  if (sum(relation==2)>1) return(3L)
  gender[relation==1]
}
DT[,Type:=get.type(Relation.to.Head,Gender), by=Id]

Using @PierreLafortune's excellent approach to the second part:

DT[, Mom.Birth := Birth[match(Mother.Id.In.HH, Id.in.HH)], by=Id]
DT
#     Id Id.in.HH Relation.to.Head Gender Mother.Id.In.HH Birth Type Mom.Birth
#  1:  1        1                2      1              NA  1950    2        NA
#  2:  1        2                3      2               3  1975    2      1955
#  3:  1        4                3      2               3  1980    2      1955
#  4:  1        3                1      2              NA  1955    2        NA
#  5:  2        2                1      1              NA  1943    3        NA
#  6:  2        3                2      2              NA  1945    3        NA
#  7:  2        1                2      2              NA  1960    3        NA
#  8:  2        5                3      1               3  1964    3      1945
#  9:  2        4                3      2               1  1980    3      1960
# 10:  3        2                1      1              NA  1975    1        NA
# 11:  3        3                2      2              NA  1977    1        NA
# 12:  3        1                3      1               3  1994    1      1977

Upvotes: 3

Related Questions