Eric Green
Eric Green

Reputation: 7725

tidyverse replacement for reshape() for panel data with round suffix

What is the tidyverse replacement for reshape() in this example? I want the wide version to take the name of the round: v2.1 and v2.2. I thought it should be gather(), but I haven't figured it out.

library(tidyverse)
r1 <- data.frame(id=c(1, 2, 3),
                 v1=c(1, 1, 0),
                 v2=c(0, 1, 1),
                 round=c(1, 1, 1))

r2 <- data.frame(id=c(1, 2, 3),
                 v2=c(1, 0, 0),
                 round=c(2, 2, 2))

r12 <- bind_rows(r1, r2)

r12w <- reshape(r12,
                timevar = "round",
                v.names = "v2",
                idvar = "id",
                direction = "wide")
r12w

#  id v1 v2.1 v2.2
#1  1  1    0    1
#2  2  1    1    0
#3  3  0    1    0

Updated example with unbalanced rows across datasets.

r1 <- data.frame(id=c(1, 2, 3, 4),
                 v1=c(1, 1, 0, 0),
                 v2=c(0, 1, 1, 1),
                 round=c(1, 1, 1, 1))

r2 <- data.frame(id=c(1, 2, 3),
                 v2=c(1, 0, 0),
                 round=c(2, 2, 2))

This mimics a panel survey where some people are not found/refuse in later rounds. Here, person 4 is in r1, but not r2. We want to keep this person in the final dataset, but with a NA value for v2. Here is the desired output. Looking for a tidverse approach to go from r1 and r2 to this output.

#  id v1 v2.1 v2.2
#1  1  1    0    1
#2  2  1    1    0
#3  3  0    1    0
#4  4  0    1   NA

Upvotes: 0

Views: 157

Answers (2)

meriops
meriops

Reputation: 1037

I am not sure I fully understand what you want but here is an attempt:

library(dplyr)
full_join(r1, r2, by = "id", suffix = c(".1", ".2")) %>%
  select(-starts_with("round"))

Upvotes: 1

akrun
akrun

Reputation: 887501

We create the missing column in 'r2' before doing the bind_rows by assigning that column from 'r1'. For this, we can use setdiff to get the column that is found in 'r1' and not in 'r2'. Then, paste the string 'v2.' with 'round' column and spread to 'wide' format

m1 <- setdiff(names(r1), names(r2))
r2[nm1] <- r1[nm1]
bind_rows(r1, r2) %>%
      mutate(round = paste0("v2.", round)) %>%
      spread(round, v2)
#  id v1 v2.1 v2.2
#1  1  1    0    1
#2  2  1    1    0
#3  3  0    1    0

NOTE: Here, we are assuming that the datasets have the same number of rows

Upvotes: 1

Related Questions