Joost Keuskamp
Joost Keuskamp

Reputation: 125

R: reshape dataframe from wide to long format based on compound column names

I have a dataframe containing observations for two sets of data (A,B), with dataset and observation type given by the column names :

mydf <- data.frame(meta1=paste0("a",1:2), meta2=paste0("b",1:2), 
                   A_var1 = c(11:12), A_var2 = c("p","r"), 
                   B_var1 = c(21:22), B_var2 = c("x","z"))

I would like to reshape this dataframe so that each row contains observations on one set only. In this long format, set and column names should by given by splitting the original column names at the '_':

mydf2 <- data.frame(meta1=rep(paste0("a",1:2),2), 
                  meta2=rep(paste0("b",1:2),2),
                  set=c("A","B","A","B"),
                  var1 = c(11:12),
                  var2 = c("a","b","c","d"))

I have tried using 'gather' in combination with 'str_split','sub', but unfortunately without success. Could this be done using tideverse functions?

Upvotes: 2

Views: 304

Answers (1)

Matt W.
Matt W.

Reputation: 3722

Yes you can do this with tidyverse !

You were close, you need to gather, then separate, then spread.

new_df <- mydf %>%
  gather(set, vars, 3:6) %>%
  separate(set, into = c('set', 'var'), sep = "_") %>%
  spread(var, vars)

hope this helps!

Upvotes: 1

Related Questions