R: reshape dataframe from wide to long format based on compound column names

Question

I have a dataframe containing observations for two sets of data (A,B), with dataset and observation type given by the column names :

mydf <- data.frame(meta1=paste0("a",1:2), meta2=paste0("b",1:2), 
                   A_var1 = c(11:12), A_var2 = c("p","r"), 
                   B_var1 = c(21:22), B_var2 = c("x","z"))

I would like to reshape this dataframe so that each row contains observations on one set only. In this long format, set and column names should by given by splitting the original column names at the '_':

mydf2 <- data.frame(meta1=rep(paste0("a",1:2),2), 
                  meta2=rep(paste0("b",1:2),2),
                  set=c("A","B","A","B"),
                  var1 = c(11:12),
                  var2 = c("a","b","c","d"))

I have tried using 'gather' in combination with 'str_split','sub', but unfortunately without success. Could this be done using tideverse functions?

Matt W. · Accepted Answer

Yes you can do this with tidyverse !

You were close, you need to gather, then separate, then spread.

new_df <- mydf %>%
  gather(set, vars, 3:6) %>%
  separate(set, into = c('set', 'var'), sep = "_") %>%
  spread(var, vars)

hope this helps!

R: reshape dataframe from wide to long format based on compound column names

Answers (1)

Related Questions