Ernest Presley
Ernest Presley

Reputation: 171

melt and reshape columns with similar column root word

I have a dataframe as follows

 id  gender group  Student_Math_1  Student_Math_2  Student_Read_1  Student_Read_2
 46  M      Red    23              45              37              56   
 46  M      Red    34              36              33              78 
 46  M      Red    56              63              58    
 62  F      Blue   59                                              68
 62  F      Blue                   68              87              73
 38  M      Red    78              57                              65
 38  M      Red                    75              54
 17  F      Blue   74                              56              72
 17  F      Blue   75              61                              79
 17  F      Blue                   74              43              81

    df = structure(list(id = c(46, 46, 46, 62, 62, 38, 38, 17, 17, 17), 
    gender = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
    1L), .Label = c("F", "M"), class = "factor"), group = structure(c(2L, 
    2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("Blue", "Red"
    ), class = "factor"), Student_Math_1 = c(23, 34, 56, 59, 
    NA, 78, NA, 74, 75, NA), Student_Math_2 = c(45, 36, 63, NA, 
    68, 57, 75, NA, 61, 74), Student_Read_1 = c(37, 33, 58, NA, 
    87, NA, 54, 56, NA, 43), Student_Read_2 = c(56, 78, NA, 68, 
    73, 65, NA, 72, 79, 81)), .Names = c("id", "gender", "group", 
"Student_Math_1", "Student_Math_2", "Student_Read_1", "Student_Read_2"
), row.names = c(NA, -10L), class = "data.frame")

What I am trying to do is reshape this dataset such that the Student_Math_1 and Student_Math_2 columns are stacked up as single column Math one below the other and similarly the Student_Read_1 and Student_Read_2 columns are stacked up as a single column Reading as shown below

 id  gender group  Math Index1          Reading Index2

 46  M      Red    23  Student_Math_1   45     Student_Read_1           
 46  M      Red    34  Student_Math_1   36     Student_Read_1  
 46  M      Red    56  Student_Math_1   63     Student_Read_1  
 62  F      Blue   59  Student_Math_1          Student_Read_1                
 62  F      Blue       Student_Math_1   68     Student_Read_1     
 38  M      Red    78  Student_Math_1   57     Student_Read_1     
 38  M      Red        Student_Math_1   75     Student_Read_1   
 17  F      Blue   74  Student_Math_1          Student_Read_1                  
 17  F      Blue   75  Student_Math_1   61     Student_Read_1         
 17  F      Blue       Student_Math_1   74     Student_Read_1     

 46  M      Red    45  Student_Math_2   56     Student_Read_2
 46  M      Red    36  Student_Math_2   78     Student_Read_2 
 46  M      Red    63  Student_Math_2          Student_Read_2
 62  F      Blue       Student_Math_2   68     Student_Read_2
 62  F      Blue   68  Student_Math_2   73     Student_Read_2
 38  M      Red    57  Student_Math_2   65     Student_Read_2
 38  M      Red    75  Student_Math_2          Student_Read_2    
 17  F      Blue       Student_Math_2   72     Student_Read_2
 17  F      Blue   61  Student_Math_2   79     Student_Read_2
 17  F      Blue   74  Student_Math_2   81     Student_Read_2

Only know this can be achieved with reshaping or melting and changing from wide to long format, not sure how to proceed beyond this. Any help on achieving this transformation is much appreciated.

Upvotes: 3

Views: 191

Answers (3)

akrun
akrun

Reputation: 887158

We can use melt from data.table

library(data.table)
melt(setDT(df), measure = patterns("Math", "Read"), 
value.name = c("Math", "Read"))[, Index1 := names(df)[4:5][variable]
            ][, Index2 := names(df)[5:6][variable]][]  

Or another option is

pat <- c("Student_Math", "Student_Read")
cbind(df[rep(1:nrow(df), 2), 1:3], do.call(cbind, lapply(pat,
          function(nm) melt(df[grep(nm, names(df))]))))

Upvotes: 2

alistaire
alistaire

Reputation: 43344

With the tidyverse, you could gather each set of columns, then filter out any values where the indices don't match in number (assuming you don't want Student_*_1 and Student_*_2 combinations):

library(tidyverse)

df %>% gather(Index1, Math, contains('Math')) %>% 
    gather(Index2, Reading, contains('Read')) %>% 
    filter(parse_number(Index1) == parse_number(Index2))

##    id gender group         Index1 Math         Index2 Reading
## 1  46      M   Red Student_Math_1   23 Student_Read_1      37
## 2  46      M   Red Student_Math_1   34 Student_Read_1      33
## 3  46      M   Red Student_Math_1   56 Student_Read_1      58
## 4  62      F  Blue Student_Math_1   59 Student_Read_1      NA
## 5  62      F  Blue Student_Math_1   NA Student_Read_1      87
## 6  38      M   Red Student_Math_1   78 Student_Read_1      NA
## 7  38      M   Red Student_Math_1   NA Student_Read_1      54
## 8  17      F  Blue Student_Math_1   74 Student_Read_1      56
## 9  17      F  Blue Student_Math_1   75 Student_Read_1      NA
## 10 17      F  Blue Student_Math_1   NA Student_Read_1      43
## 11 46      M   Red Student_Math_2   45 Student_Read_2      56
## 12 46      M   Red Student_Math_2   36 Student_Read_2      78
## 13 46      M   Red Student_Math_2   63 Student_Read_2      NA
## 14 62      F  Blue Student_Math_2   NA Student_Read_2      68
## 15 62      F  Blue Student_Math_2   68 Student_Read_2      73
## 16 38      M   Red Student_Math_2   57 Student_Read_2      65
## 17 38      M   Red Student_Math_2   75 Student_Read_2      NA
## 18 17      F  Blue Student_Math_2   NA Student_Read_2      72
## 19 17      F  Blue Student_Math_2   61 Student_Read_2      79
## 20 17      F  Blue Student_Math_2   74 Student_Read_2      81

Upvotes: 1

Silence Dogood
Silence Dogood

Reputation: 3597

Using melt from reshape2 and passing appropriate inputs to id, measure.vars :

MathDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Math_1","Student_Math_2"),value.name = "Math",
    variable.name = "Index1")

ReadDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Read_1","Student_Read_2"),value.name = "Read",
    variable.name = "Index2")


mergeDF = merge(MathDF,ReadDF,by=c("id","gender","group"))

head(mergeDF)
# id gender group         Index1 Math         Index2 Read
# 1 46      M   Red Student_Math_1   23 Student_Read_1   37
# 2 46      M   Red Student_Math_1   23 Student_Read_1   33
# 3 46      M   Red Student_Math_1   23 Student_Read_1   58
# 4 46      M   Red Student_Math_1   23 Student_Read_2   78
# 5 46      M   Red Student_Math_1   23 Student_Read_2   NA
# 6 46      M   Red Student_Math_1   23 Student_Read_2   56

Upvotes: 1

Related Questions