Fabio
Fabio

Reputation: 27

How to reshape a dataframe from wide to long with columns that represent duplicated variables?

I have the following dataframe, I would like to 'transfer' the values of columns from v3 to v10 in the first two columns v1 and v2, respectively, since they are duplicated variables and their values belong to the first two variables (v1 and v2).

df <- data.frame(id = 1:5,
              v1 = c("infection", "anxiety", "pain", "infection", "aspiration"),
              v2 = c(31, 55, 33, 31, 24),
              v3 = c("aspiration", NA, "ulcer", NA, "pain"),
              v4 = c(26, NA, 43, NA, 20),
              v5 = c("skin", NA, NA, NA, NA),
              v6 = c(25, NA, NA, NA, NA),
              v7 = c("ventilation", NA, NA, NA, NA),
              v8 = c(8, NA, NA, NA, NA),
              v9 = c("gas", NA, NA, NA, NA),
              v10 = c(4, NA, NA, NA, NA))

I would like to have as output this new df:

out <- data.frame(id = c(1,1,1,1,1,2,3,3,4,5,5),
              v1 = c("infection", "aspiration", "skin", "ventilation", "gas", "anxiety", "pain", "ulcer", "infection", "aspiration", "pain"),
              v2 = c(31, 26, 25, 8, 4, 55, 33, 43, 31, 24, 20))

Thank you so much for your help!

Upvotes: 1

Views: 54

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389055

Renaming column would be better since it will clearly identify which column belongs to which group. You can then use pivot_longer directly by using names_pattern argument.

library(dplyr)
library(tidyr)

df %>%
  rename_with(~paste(., c('v1', 'v2'), sep = '_'), -1) %>%
  pivot_longer(cols = -id, 
               names_to = '.value', 
               names_pattern = '.*_(\\w+)', 
               values_drop_na = TRUE) -> out
out

#      id v1             v2
#   <int> <chr>       <dbl>
# 1     1 infection      31
# 2     1 aspiration     26
# 3     1 skin           25
# 4     1 ventilation     8
# 5     1 gas             4
# 6     2 anxiety        55
# 7     3 pain           33
# 8     3 ulcer          43
# 9     4 infection      31
#10     5 aspiration     24
#11     5 pain           20

Upvotes: 3

dash2
dash2

Reputation: 2262

This is ugly and I'm not sure how it works, but it does:

library(dplyr)
library(tidyr)

# make life easier for tidyr
names(df)[seq(2, 11, 2)]  <- paste0("cond_", 1:5)
names(df)[seq(3, 11, 2)]  <- paste0("var_", 1:5)

# stops tidyr complaining; if the numerics of var matter, you can change them back later
df <- df %>% mutate(across(starts_with("var"), as.character))

df %>% 
      pivot_longer(-id, names_to = c("var", "v2"), names_sep = "_") %>%
      pivot_wider(c(id, v2), names_from = var) %>% 
      na.omit()

Upvotes: 1

Related Questions