Jason130497
Jason130497

Reputation: 3

Combine rows into one with tidyverse based off IDs

I have a panel dataset that includes three types of IDs:

  1. An ID for every unique individual in the data, called xwaveid
  2. An ID for mothers that is matched to their child by the xwaveid, called mother_id. This ID is the same as their xwaveid.
  3. An ID for fathers that is the same as the mothers.

The mother and fathers ID's are matched to their child's xwaveid but the row only contains the variables for the child. This is the same with the mothers and fathers. I.e. all variables are linked to the individuals xwaveid. So the data looks something like this:

xwaveid mother_id father_id weight wave 
001     005       006       85     a
005     na        na        96     a
006     na        na        99     a
001     005       006       70     b
005     na        na        90     b
006     na        na        94     b

Where the ID 001 is the child's unique ID and the 005 and 006 are both the xwaveid for the mother and father and also their own unique ID where the data for their weight is attached. I would like to make this kind of structure:

xwaveid_child xwaveid_mother xwaveid_father child_weight mother_weight father_weight wave
001           005            006            85           96            99            a
001           005            006            70           90            94            b

I do not even know where to start and would greatly appreciate any help!

Thanks a lot

Upvotes: 0

Views: 296

Answers (1)

akrun
akrun

Reputation: 886948

The NA are not real NA and it is a string "na". Convert those to NA, then fill the NA elements with the previous non-NA elements in mother_id, father_id, create a sequence column (rowid) and reshape to 'wide' format (pivot_wider)

library(dplyr)
library(data.table)
library(tidyr)
df1 %>%
   mutate(across(c(mother_id, father_id), na_if, "na")) %>% 
   fill(c(mother_id, father_id)) %>%
   mutate(rn = rowid(mother_id, father_id, wave)) %>% 
   pivot_wider(names_from = rn, values_from = c(xwaveid, weight))

data

df1 <- structure(list(xwaveid = c(1L, 5L, 6L, 1L, 5L, 6L), mother_id = c("005", 
"na", "na", "005", "na", "na"), father_id = c("006", "na", "na", 
"006", "na", "na"), weight = c(85L, 96L, 99L, 70L, 90L, 94L), 
    wave = c("a", "a", "a", "b", "b", "b")), class = "data.frame",
    row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions