Reputation: 171
I have a dataframe as follows
id gender group Student_Math_1 Student_Math_2 Student_Read_1 Student_Read_2
46 M Red 23 45 37 56
46 M Red 34 36 33 78
46 M Red 56 63 58
62 F Blue 59 68
62 F Blue 68 87 73
38 M Red 78 57 65
38 M Red 75 54
17 F Blue 74 56 72
17 F Blue 75 61 79
17 F Blue 74 43 81
df = structure(list(id = c(46, 46, 46, 62, 62, 38, 38, 17, 17, 17),
gender = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L,
1L), .Label = c("F", "M"), class = "factor"), group = structure(c(2L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("Blue", "Red"
), class = "factor"), Student_Math_1 = c(23, 34, 56, 59,
NA, 78, NA, 74, 75, NA), Student_Math_2 = c(45, 36, 63, NA,
68, 57, 75, NA, 61, 74), Student_Read_1 = c(37, 33, 58, NA,
87, NA, 54, 56, NA, 43), Student_Read_2 = c(56, 78, NA, 68,
73, 65, NA, 72, 79, 81)), .Names = c("id", "gender", "group",
"Student_Math_1", "Student_Math_2", "Student_Read_1", "Student_Read_2"
), row.names = c(NA, -10L), class = "data.frame")
What I am trying to do is reshape this dataset such that the Student_Math_1
and Student_Math_2
columns are stacked up as single column Math
one below the other and similarly the Student_Read_1
and Student_Read_2
columns are stacked up as a single column Reading
as shown below
id gender group Math Index1 Reading Index2
46 M Red 23 Student_Math_1 45 Student_Read_1
46 M Red 34 Student_Math_1 36 Student_Read_1
46 M Red 56 Student_Math_1 63 Student_Read_1
62 F Blue 59 Student_Math_1 Student_Read_1
62 F Blue Student_Math_1 68 Student_Read_1
38 M Red 78 Student_Math_1 57 Student_Read_1
38 M Red Student_Math_1 75 Student_Read_1
17 F Blue 74 Student_Math_1 Student_Read_1
17 F Blue 75 Student_Math_1 61 Student_Read_1
17 F Blue Student_Math_1 74 Student_Read_1
46 M Red 45 Student_Math_2 56 Student_Read_2
46 M Red 36 Student_Math_2 78 Student_Read_2
46 M Red 63 Student_Math_2 Student_Read_2
62 F Blue Student_Math_2 68 Student_Read_2
62 F Blue 68 Student_Math_2 73 Student_Read_2
38 M Red 57 Student_Math_2 65 Student_Read_2
38 M Red 75 Student_Math_2 Student_Read_2
17 F Blue Student_Math_2 72 Student_Read_2
17 F Blue 61 Student_Math_2 79 Student_Read_2
17 F Blue 74 Student_Math_2 81 Student_Read_2
Only know this can be achieved with reshaping or melting and changing from wide to long format, not sure how to proceed beyond this. Any help on achieving this transformation is much appreciated.
Upvotes: 3
Views: 191
Reputation: 887158
We can use melt
from data.table
library(data.table)
melt(setDT(df), measure = patterns("Math", "Read"),
value.name = c("Math", "Read"))[, Index1 := names(df)[4:5][variable]
][, Index2 := names(df)[5:6][variable]][]
Or another option is
pat <- c("Student_Math", "Student_Read")
cbind(df[rep(1:nrow(df), 2), 1:3], do.call(cbind, lapply(pat,
function(nm) melt(df[grep(nm, names(df))]))))
Upvotes: 2
Reputation: 43344
With the tidyverse, you could gather
each set of columns, then filter
out any values where the indices don't match in number (assuming you don't want Student_*_1
and Student_*_2
combinations):
library(tidyverse)
df %>% gather(Index1, Math, contains('Math')) %>%
gather(Index2, Reading, contains('Read')) %>%
filter(parse_number(Index1) == parse_number(Index2))
## id gender group Index1 Math Index2 Reading
## 1 46 M Red Student_Math_1 23 Student_Read_1 37
## 2 46 M Red Student_Math_1 34 Student_Read_1 33
## 3 46 M Red Student_Math_1 56 Student_Read_1 58
## 4 62 F Blue Student_Math_1 59 Student_Read_1 NA
## 5 62 F Blue Student_Math_1 NA Student_Read_1 87
## 6 38 M Red Student_Math_1 78 Student_Read_1 NA
## 7 38 M Red Student_Math_1 NA Student_Read_1 54
## 8 17 F Blue Student_Math_1 74 Student_Read_1 56
## 9 17 F Blue Student_Math_1 75 Student_Read_1 NA
## 10 17 F Blue Student_Math_1 NA Student_Read_1 43
## 11 46 M Red Student_Math_2 45 Student_Read_2 56
## 12 46 M Red Student_Math_2 36 Student_Read_2 78
## 13 46 M Red Student_Math_2 63 Student_Read_2 NA
## 14 62 F Blue Student_Math_2 NA Student_Read_2 68
## 15 62 F Blue Student_Math_2 68 Student_Read_2 73
## 16 38 M Red Student_Math_2 57 Student_Read_2 65
## 17 38 M Red Student_Math_2 75 Student_Read_2 NA
## 18 17 F Blue Student_Math_2 NA Student_Read_2 72
## 19 17 F Blue Student_Math_2 61 Student_Read_2 79
## 20 17 F Blue Student_Math_2 74 Student_Read_2 81
Upvotes: 1
Reputation: 3597
Using melt
from reshape2
and passing appropriate inputs to id
, measure.vars
:
MathDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Math_1","Student_Math_2"),value.name = "Math",
variable.name = "Index1")
ReadDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Read_1","Student_Read_2"),value.name = "Read",
variable.name = "Index2")
mergeDF = merge(MathDF,ReadDF,by=c("id","gender","group"))
head(mergeDF)
# id gender group Index1 Math Index2 Read
# 1 46 M Red Student_Math_1 23 Student_Read_1 37
# 2 46 M Red Student_Math_1 23 Student_Read_1 33
# 3 46 M Red Student_Math_1 23 Student_Read_1 58
# 4 46 M Red Student_Math_1 23 Student_Read_2 78
# 5 46 M Red Student_Math_1 23 Student_Read_2 NA
# 6 46 M Red Student_Math_1 23 Student_Read_2 56
Upvotes: 1