Reputation: 2424
My question is similar to this question. I'm trying to tidyr::gather
multiple columns. However, the solution provided in the link is less than ideal because the attributes are generally not identical across all columns and so they are dropped.
Note, I know how to do this with base R, but I'm trying to learn how to do the equivalent operation with tidyr and/or dplyr.
Below I've simulated some data (poorly, but quickly) that illustrate the situation I often find myself in (although I generally have far more columns that follow this same sort of a pattern). I've provided the base solution with stats::reshape
so you can see my desired output.
Any help would be much appreciated.
set.seed(123)
male_g6 <- rbinom(100, 1, .5)
ell_g6 <- rbinom(100, 1, .1)
sped_g6 <- rbinom(100, 1, .15)
pullouts_g6 <- rbinom(100, 5, .1)
disability_g6 <- replicate(100,
sample(
c("asd", "cd", "ed", "hi", "id", "ohi", "ld", "none"),
1,
prob = c(rep(0.01, 6), 0.05, 0.89)
)
)
score_g6 <- rnorm(100, 200, 10)
score_g7 <- score_g6 + 5 + rnorm(100, 0, 2)
score_g8 <- score_g7 + 5 + rnorm(100, 0, 2)
d <- data.frame(
SID = 1:100,
male_g6 = male_g6,
male_g7 = male_g6,
male_g8 = male_g6,
ell_g6 = ell_g6,
ell_g7 = ell_g6,
ell_g8 = ell_g6,
sped_g6 = sped_g6,
sped_g7 = sped_g6,
sped_g8 = sped_g6,
pullouts_g6 = pullouts_g6,
pullouts_g7 = pullouts_g6,
pullouts_g8 = pullouts_g6,
disability_g6 = disability_g6,
disability_g7 = disability_g6,
disability_g8 = disability_g6,
score_g6 = score_g6,
score_g7 = score_g7,
score_g8 = score_g8
)
ld <- stats::reshape(d,
idvar = "SID",
varying = list(
c("male_g6", "male_g7", "male_g8"),
c("ell_g6", "ell_g7", "ell_g8"),
c("sped_g6", "sped_g7", "sped_g8"),
c("pullouts_g6", "pullouts_g7", "pullouts_g8"),
c("disability_g6", "disability_g7", "disability_g8"),
c("score_g6", "score_g7", "score_g8")
),
v.names = c("male", "ell", "sped", "pullouts", "disability", "score"),
times = 6:8,
timevar = "Grade",
direction = "long"
)
ld <- ld[order(ld$SID), ]
Upvotes: 4
Views: 489
Reputation: 43334
You'll need to gather beyond what you want to end with so you can separate the grade level from the headers, after which you can spread back to wide form:
ld2 <- d %>% gather(var, val, -SID) %>% # gather to long form
# separate grade from variable names
separate(var, c('var', 'grade'), sep = '_g', convert = TRUE) %>%
spread(var, val, convert = TRUE) # spread back to wide
head(ld2)
## SID grade disability ell male pullouts score sped
## 1 1 6 cd 0 0 1 196.2440 0
## 2 1 7 cd 0 0 1 203.2739 0
## 3 1 8 cd 0 0 1 211.1347 0
## 4 2 6 none 0 1 0 194.3812 1
## 5 2 7 none 0 1 0 195.3957 1
## 6 2 8 none 0 1 0 202.4890 1
Upvotes: 2