Reputation: 3
I have a panel dataset that includes three types of IDs:
The mother and fathers ID's are matched to their child's xwaveid but the row only contains the variables for the child. This is the same with the mothers and fathers. I.e. all variables are linked to the individuals xwaveid. So the data looks something like this:
xwaveid mother_id father_id weight wave
001 005 006 85 a
005 na na 96 a
006 na na 99 a
001 005 006 70 b
005 na na 90 b
006 na na 94 b
Where the ID 001 is the child's unique ID and the 005 and 006 are both the xwaveid for the mother and father and also their own unique ID where the data for their weight is attached. I would like to make this kind of structure:
xwaveid_child xwaveid_mother xwaveid_father child_weight mother_weight father_weight wave
001 005 006 85 96 99 a
001 005 006 70 90 94 b
I do not even know where to start and would greatly appreciate any help!
Thanks a lot
Upvotes: 0
Views: 296
Reputation: 886948
The NA
are not real NA and it is a string "na". Convert those to NA
, then fill
the NA elements with the previous non-NA elements in mother_id, father_id, create a sequence column (rowid
) and reshape to 'wide' format (pivot_wider
)
library(dplyr)
library(data.table)
library(tidyr)
df1 %>%
mutate(across(c(mother_id, father_id), na_if, "na")) %>%
fill(c(mother_id, father_id)) %>%
mutate(rn = rowid(mother_id, father_id, wave)) %>%
pivot_wider(names_from = rn, values_from = c(xwaveid, weight))
df1 <- structure(list(xwaveid = c(1L, 5L, 6L, 1L, 5L, 6L), mother_id = c("005",
"na", "na", "005", "na", "na"), father_id = c("006", "na", "na",
"006", "na", "na"), weight = c(85L, 96L, 99L, 70L, 90L, 94L),
wave = c("a", "a", "a", "b", "b", "b")), class = "data.frame",
row.names = c(NA,
-6L))
Upvotes: 1