Reputation: 219
Problem:
I have a survey dataset which includes intra-household relationships. I had to subdivide household
into tax-unit
, which means I need to redefine the relationship matrices based on the new tax-unit
grouping.
Intra-family relationships are coded as:
1
= Spouse, 2
= Cohabiting partner, 3
= Son/daughter, 4
= Step son/daughter, 5
= Foster child, 6
= Son-in-law/daughter-in-law, 7
= Parent/guardian, 8
= Step-parent, 9
= Foster parent, 10
= Parent-in-law, 11
= Brother/sister, 12
= Step-brother/sister, 13
= Foster brother/sister, 14
= Brother/sister-in-law, 15
= Grand-child, 16
= Grand-parent, 17
= Other relative, 18
= Other non-relative.
Relationship corollaries are:
rel = c("1" = 1, "2" = 2, "3" = 7, "4" = 8, "5" = 9, "6" = 10, "7" = 3, "8" = 4, "9" = 5, "10" = 6, "11" = 11, "12" = 12, "13" = 13, "14" = 14, "15" = 16, "16" = 15, "17" = 17, "18" = 18)
Example Dataset
household person r01 r02 r03 r04 r05 r06 household.tu
1 1 1 NA 1 7 7 NA NA 1a
2 1 2 1 NA 7 7 NA NA 1a
3 1 3 3 3 NA 11 NA NA 1b
4 1 4 3 3 11 NA NA NA 1a
5 2 1 NA 7 16 NA NA NA 2a
6 2 2 3 NA 7 NA NA NA 2b
7 2 3 15 3 NA NA NA NA 2b
8 3 1 NA 18 NA NA NA NA 3a
9 3 2 18 NA NA NA NA NA 3b
10 4 1 NA NA NA NA NA NA 4a
11 5 1 NA 9 NA NA NA NA 5a
12 5 2 5 NA NA NA NA NA 5b
Code to Reproduce
data.frame(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
person = c(1,2,3,4,1,2,3,1,2,1,1,2),
r01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
r02 = c(1, NA, 3, 3, 7, NA, 3, 18, NA, NA, 9, NA),
r03 = c(7, 7, NA, 11, 16, 7, rep(NA,6)),
r04 = c(7, 7, 11, rep(NA, 9)),
r05 = rep(NA, 12),
r06 = rep(NA, 12),
household.tu = c("1a", "1a", "1b", "1a", "2a", "2b", "2b", "3a", "3b", "4a", "5a", "5b"))
The final column household.tu
in essence creates new households, so the new relationship matrices r01.tu
... should be based on a person
count within the new tax-units person.tu
Desired Output
household person r01 r02 r03 r04 r05 r06 household.tu person.tu r01.tu r02.tu r03.tu r04.tu r05.tu r06.tu
1 1 1 NA 1 7 7 NA NA 1a 1 NA 1 7 NA NA NA
2 1 2 1 NA 7 7 NA NA 1a 2 1 NA 7 NA NA NA
3 1 3 3 3 NA 11 NA NA 1b 1 NA NA NA NA NA NA
4 1 4 3 3 11 NA NA NA 1a 3 3 3 NA NA NA NA
5 2 1 NA 7 16 NA NA NA 2a 1 NA NA NA NA NA NA
6 2 2 3 NA 7 NA NA NA 2b 1 NA 7 NA NA NA NA
7 2 3 15 3 NA NA NA NA 2b 2 3 NA NA NA NA NA
8 3 1 NA 18 NA NA NA NA 3a 1 NA NA NA NA NA NA
9 3 2 18 NA NA NA NA NA 3b 1 NA NA NA NA NA NA
10 4 1 NA NA NA NA NA NA 4a 1 NA NA NA NA NA NA
11 5 1 NA 9 NA NA NA NA 5a 1 NA NA NA NA NA NA
12 5 2 5 NA NA NA NA NA 5b 1 NA NA NA NA NA NA
I have made several attempts using dplyr
approaches, however I think this requires writing a function which is where I am struggling to understand how to broach.
Thankyou
Upvotes: 1
Views: 46
Reputation: 2816
Not as concise as the data.table
approach, but here's a method that uses tidyverse
.
# Pivot from wide to long: one row per relationship.
tu.df = df %>%
pivot_longer(cols = matches("^r[0-9]+$"), names_to = "other.person",
names_prefix = "r", values_to = "relationship") %>%
filter(!is.na(relationship)) %>%
mutate(other.person = as.numeric(other.person))
# Renumber persons within tax units.
tu.df = tu.df %>%
group_by(household.tu) %>%
arrange(person, other.person) %>%
mutate(person.tu = dense_rank(person)) %>%
ungroup() %>%
arrange(household.tu, person.tu)
# Get the renumbered ID of the other person in the relationship.
tu.df = tu.df %>%
inner_join(tu.df %>%
dplyr::select(household.tu, person,
other.person.tu = person.tu) %>%
distinct(),
by = c("household.tu", "other.person" = "person"))
# Pivot from long back to wide.
tu.df = tu.df %>%
dplyr::select(-c("other.person")) %>%
mutate(other.person.tu = str_pad(other.person.tu, 2, "left", "0"),
other.person.tu = paste("r", other.person.tu, ".tu", sep = "")) %>%
pivot_wider(id_cols = c("household", "person", "household.tu", "person.tu"),
names_from = "other.person.tu", names_sort = T,
values_from = "relationship")
# Add back to the original table, if desired.
df %>%
left_join(tu.df, by = c("household", "person", "household.tu"))
Naturally, several of these steps could be chained together. The only place where chaining does not work is the self-join that gets renumbered IDs for the second person in the relationship. (There may be a way to do this without a self-join, but I went with this because it works and it's relatively straightforward.)
Upvotes: 1
Reputation: 17011
With data.table
:
nms <- names(dt)[3:8]
dtNA <- setnames(as.data.frame(rep(list(dt$r01[1]), 6)), nms)
f <- function(dt, sel) c(dt[,..sel], dtNA[,-sel])[nms]
dt[
,c("person.tu", paste0(nms, ".tu")) := c(.(1:.N), f(.SD, person)),
by = household.tu, .SDcols = c(nms, "person")
][]
#> household person r01 r02 r03 r04 r05 r06 household.tu person.tu r01.tu r02.tu r03.tu r04.tu r05.tu r06.tu
#> <num> <num> <num> <num> <num> <num> <lgcl> <lgcl> <char> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1 NA 1 7 7 NA NA 1a 1 NA 1 NA 7 NA NA
#> 2: 1 2 1 NA 7 7 NA NA 1a 2 1 NA NA 7 NA NA
#> 3: 1 3 3 3 NA 11 NA NA 1b 1 NA NA NA NA NA NA
#> 4: 1 4 3 3 11 NA NA NA 1a 3 3 3 NA NA NA NA
#> 5: 2 1 NA 7 16 NA NA NA 2a 1 NA NA NA NA NA NA
#> 6: 2 2 3 NA 7 NA NA NA 2b 1 NA NA 7 NA NA NA
#> 7: 2 3 15 3 NA NA NA NA 2b 2 NA 3 NA NA NA NA
#> 8: 3 1 NA 18 NA NA NA NA 3a 1 NA NA NA NA NA NA
#> 9: 3 2 18 NA NA NA NA NA 3b 1 NA NA NA NA NA NA
#> 10: 4 1 NA NA NA NA NA NA 4a 1 NA NA NA NA NA NA
#> 11: 5 1 NA 9 NA NA NA NA 5a 1 NA NA NA NA NA NA
#> 12: 5 2 5 NA NA NA NA NA 5b 1 NA NA NA NA NA NA
Data:
library(data.table)
dt <- data.table(household = c(1,1,1,1,2,2,2,3,3,4,5,5),
person = c(1,2,3,4,1,2,3,1,2,1,1,2),
r01 = c(NA, 1, 3, 3, NA, 3, 15, NA, 18, NA, NA, 5),
r02 = c(1, NA, 3, 3, 7, NA, 3, 18, NA, NA, 9, NA),
r03 = c(7, 7, NA, 11, 16, 7, rep(NA,6)),
r04 = c(7, 7, 11, rep(NA, 9)),
r05 = rep(NA, 12),
r06 = rep(NA, 12),
household.tu = c("1a", "1a", "1b", "1a", "2a", "2b", "2b", "3a", "3b", "4a", "5a", "5b"))
Upvotes: 1