ravinglooper
ravinglooper

Reputation: 219

Recode relationship matrices based on new subgrouping

Problem: I have a survey dataset which includes intra-household relationships. I had to subdivide household into tax-unit, which means I need to redefine the relationship matrices based on the new tax-unit grouping.

Intra-family relationships are coded as:

1 = Spouse, 2= Cohabiting partner, 3 = Son/daughter, 4 = Step son/daughter, 5 = Foster child, 6 = Son-in-law/daughter-in-law, 7 = Parent/guardian, 8 = Step-parent, 9 = Foster parent, 10 = Parent-in-law, 11 = Brother/sister, 12 = Step-brother/sister, 13 = Foster brother/sister, 14 = Brother/sister-in-law, 15 = Grand-child, 16 = Grand-parent, 17 = Other relative, 18 = Other non-relative.

Relationship corollaries are: rel = c("1" = 1, "2" = 2, "3" = 7, "4" = 8, "5" = 9, "6" = 10, "7" = 3, "8" = 4, "9" = 5, "10" = 6, "11" = 11, "12" = 12, "13" = 13, "14" = 14, "15" = 16, "16" = 15, "17" = 17, "18" = 18)

Example Dataset

   household person r01 r02 r03 r04 r05 r06 household.tu
1          1      1  NA   1   7   7  NA  NA           1a
2          1      2   1  NA   7   7  NA  NA           1a
3          1      3   3   3  NA  11  NA  NA           1b
4          1      4   3   3  11  NA  NA  NA           1a
5          2      1  NA   7  16  NA  NA  NA           2a
6          2      2   3  NA   7  NA  NA  NA           2b
7          2      3  15   3  NA  NA  NA  NA           2b
8          3      1  NA  18  NA  NA  NA  NA           3a
9          3      2  18  NA  NA  NA  NA  NA           3b
10         4      1  NA  NA  NA  NA  NA  NA           4a
11         5      1  NA   9  NA  NA  NA  NA           5a
12         5      2   5  NA  NA  NA  NA  NA           5b

Code to Reproduce

data.frame(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
           person = c(1,2,3,4,1,2,3,1,2,1,1,2),
           r01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
           r02 = c(1, NA, 3, 3, 7, NA, 3, 18, NA, NA, 9, NA),
           r03 = c(7, 7, NA, 11, 16, 7, rep(NA,6)),
           r04 = c(7, 7, 11, rep(NA, 9)),
           r05 = rep(NA, 12),
           r06 = rep(NA, 12),
           household.tu = c("1a", "1a", "1b", "1a", "2a", "2b", "2b", "3a", "3b", "4a", "5a", "5b"))

The final column household.tu in essence creates new households, so the new relationship matrices r01.tu... should be based on a person count within the new tax-units person.tu

Desired Output

   household person r01 r02 r03 r04 r05 r06 household.tu person.tu r01.tu r02.tu r03.tu r04.tu r05.tu r06.tu
1          1      1  NA   1   7   7  NA  NA           1a         1     NA      1      7     NA     NA     NA
2          1      2   1  NA   7   7  NA  NA           1a         2      1     NA      7     NA     NA     NA
3          1      3   3   3  NA  11  NA  NA           1b         1     NA     NA     NA     NA     NA     NA
4          1      4   3   3  11  NA  NA  NA           1a         3      3      3     NA     NA     NA     NA
5          2      1  NA   7  16  NA  NA  NA           2a         1     NA     NA     NA     NA     NA     NA
6          2      2   3  NA   7  NA  NA  NA           2b         1     NA      7     NA     NA     NA     NA
7          2      3  15   3  NA  NA  NA  NA           2b         2      3     NA     NA     NA     NA     NA
8          3      1  NA  18  NA  NA  NA  NA           3a         1     NA     NA     NA     NA     NA     NA
9          3      2  18  NA  NA  NA  NA  NA           3b         1     NA     NA     NA     NA     NA     NA
10         4      1  NA  NA  NA  NA  NA  NA           4a         1     NA     NA     NA     NA     NA     NA
11         5      1  NA   9  NA  NA  NA  NA           5a         1     NA     NA     NA     NA     NA     NA
12         5      2   5  NA  NA  NA  NA  NA           5b         1     NA     NA     NA     NA     NA     NA

I have made several attempts using dplyr approaches, however I think this requires writing a function which is where I am struggling to understand how to broach.

Thankyou

Upvotes: 1

Views: 46

Answers (2)

A. S. K.
A. S. K.

Reputation: 2816

Not as concise as the data.table approach, but here's a method that uses tidyverse.

# Pivot from wide to long: one row per relationship.
tu.df = df %>%
  pivot_longer(cols = matches("^r[0-9]+$"), names_to = "other.person",
               names_prefix = "r", values_to = "relationship") %>%
  filter(!is.na(relationship)) %>%
  mutate(other.person = as.numeric(other.person))

# Renumber persons within tax units.
tu.df = tu.df %>%
  group_by(household.tu) %>%
  arrange(person, other.person) %>%
  mutate(person.tu = dense_rank(person)) %>%
  ungroup() %>%
  arrange(household.tu, person.tu)

# Get the renumbered ID of the other person in the relationship.
tu.df = tu.df %>%
  inner_join(tu.df %>%
               dplyr::select(household.tu, person,
                             other.person.tu = person.tu) %>%
               distinct(),
             by = c("household.tu", "other.person" = "person"))

# Pivot from long back to wide.
tu.df = tu.df %>%
  dplyr::select(-c("other.person")) %>%
  mutate(other.person.tu = str_pad(other.person.tu, 2, "left", "0"),
         other.person.tu = paste("r", other.person.tu, ".tu", sep = "")) %>%
  pivot_wider(id_cols = c("household", "person", "household.tu", "person.tu"),
              names_from = "other.person.tu", names_sort = T,
              values_from = "relationship")

# Add back to the original table, if desired.
df %>%
  left_join(tu.df, by = c("household", "person", "household.tu"))

Naturally, several of these steps could be chained together. The only place where chaining does not work is the self-join that gets renumbered IDs for the second person in the relationship. (There may be a way to do this without a self-join, but I went with this because it works and it's relatively straightforward.)

Upvotes: 1

jblood94
jblood94

Reputation: 17011

With data.table:

nms <- names(dt)[3:8]
dtNA <- setnames(as.data.frame(rep(list(dt$r01[1]), 6)), nms)
f <- function(dt, sel) c(dt[,..sel], dtNA[,-sel])[nms]
dt[
  ,c("person.tu", paste0(nms, ".tu")) := c(.(1:.N), f(.SD, person)),
  by = household.tu, .SDcols = c(nms, "person")
][]
#>     household person   r01   r02   r03   r04    r05    r06 household.tu person.tu r01.tu r02.tu r03.tu r04.tu r05.tu r06.tu
#>         <num>  <num> <num> <num> <num> <num> <lgcl> <lgcl>       <char>     <int>  <num>  <num>  <num>  <num>  <num>  <num>
#>  1:         1      1    NA     1     7     7     NA     NA           1a         1     NA      1     NA      7     NA     NA
#>  2:         1      2     1    NA     7     7     NA     NA           1a         2      1     NA     NA      7     NA     NA
#>  3:         1      3     3     3    NA    11     NA     NA           1b         1     NA     NA     NA     NA     NA     NA
#>  4:         1      4     3     3    11    NA     NA     NA           1a         3      3      3     NA     NA     NA     NA
#>  5:         2      1    NA     7    16    NA     NA     NA           2a         1     NA     NA     NA     NA     NA     NA
#>  6:         2      2     3    NA     7    NA     NA     NA           2b         1     NA     NA      7     NA     NA     NA
#>  7:         2      3    15     3    NA    NA     NA     NA           2b         2     NA      3     NA     NA     NA     NA
#>  8:         3      1    NA    18    NA    NA     NA     NA           3a         1     NA     NA     NA     NA     NA     NA
#>  9:         3      2    18    NA    NA    NA     NA     NA           3b         1     NA     NA     NA     NA     NA     NA
#> 10:         4      1    NA    NA    NA    NA     NA     NA           4a         1     NA     NA     NA     NA     NA     NA
#> 11:         5      1    NA     9    NA    NA     NA     NA           5a         1     NA     NA     NA     NA     NA     NA
#> 12:         5      2     5    NA    NA    NA     NA     NA           5b         1     NA     NA     NA     NA     NA     NA

Data:

library(data.table)

dt <- data.table(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
           person = c(1,2,3,4,1,2,3,1,2,1,1,2),
           r01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
           r02 = c(1, NA, 3, 3, 7, NA, 3, 18, NA, NA, 9, NA),
           r03 = c(7, 7, NA, 11, 16, 7, rep(NA,6)),
           r04 = c(7, 7, 11, rep(NA, 9)),
           r05 = rep(NA, 12),
           r06 = rep(NA, 12),
           household.tu = c("1a", "1a", "1b", "1a", "2a", "2b", "2b", "3a", "3b", "4a", "5a", "5b"))

Upvotes: 1

Related Questions