Reputation: 155
I have data where students are rated by two raters each on multiple questions. Each row contains these variables:
....and then it repeats for multiple items.
It looks something like this:
Student_ID <- c(1:4)
Item1_first_rater_id <- c(1,2,1,2)
Item1_first_rating <- c(2,3,4,2)
Item1_second_rater_id <- c(2,3,2,3)
Item1_second_rating <- c(4,5,3,2)
Item2_first_rater_id <- c(4,2,5,1)
Item2_first_rating <- c(2,3,4,2)
Item2_second_rater_id <- c(6,7,2,3)
Item2_second_rating <- c(3,4,5,4)
wide <- data.frame(Student_ID, Item1_first_rater_id, Item1_first_rating,
Item1_second_rater_id, Item1_second_rating,
Item2_first_rater_id, Item2_first_rating,
Item2_second_rater_id, Item2_second_rating)
I need the data to be in a long format like this:
Student_ID <- c(1:4)
Item_number <- c(1,1,2,2)
Rater_id <- c(1:4)
Score <- c(2,3,4,5)
long <- data.frame(Student_ID, Item_number, Rater_id, Score)
Any ideas about how to reshape?
Thanks.
Upvotes: 0
Views: 911
Reputation: 193687
It isn't totally clear what you're trying to do (in other words, how exactly you want to transform your source data). Here is one guess that might at least get you closer to your desired output.
It seems like the names
in your "wide" dataset contain three sets of information: (1) an item number, (2) a "time" (first or second), and (3) another variable (either "rating" or "rater id").
We can use melt
, colsplit
, and dcast
to facilitate our reshaping.
melt
the datasetlibrary(reshape2)
orignames <- names(wide) # Store the original names so we can replace them
names(wide) <- gsub("Item([0-9])_(.*)_(rater_id|rating)",
"\\1\\.\\2\\.\\3", names(wide))
# "melt" the dataset
m.wide <- melt(wide, id.vars="Student_ID")
head(m.wide)
# Student_ID variable value
# 1 1 1.first.rater_id 1
# 2 2 1.first.rater_id 2
# 3 3 1.first.rater_id 1
# 4 4 1.first.rater_id 2
# 5 1 1.first.rating 2
# 6 2 1.first.rating 3
colsplit
m.wide <- cbind(m.wide,
colsplit(m.wide$variable, "\\.",
c("Item", "Time", "Var")))
head(m.wide)
# Student_ID variable value Item Time Var
# 1 1 1.first.rater_id 1 1 first rater_id
# 2 2 1.first.rater_id 2 1 first rater_id
# 3 3 1.first.rater_id 1 1 first rater_id
# 4 4 1.first.rater_id 2 1 first rater_id
# 5 1 1.first.rating 2 1 first rating
# 6 2 1.first.rating 3 1 first rating
dcast
to reshape the datadcast(m.wide, Student_ID + Item ~ Time + Var, value.var="value")
# Student_ID Item first_rater_id first_rating second_rater_id second_rating
# 1 1 1 1 2 2 4
# 2 1 2 4 2 6 3
# 3 2 1 2 3 3 5
# 4 2 2 2 3 7 4
# 5 3 1 1 4 2 3
# 6 3 2 5 4 2 5
# 7 4 1 2 2 3 2
# 8 4 2 1 2 3 4
Switching what's to the left and what's to the right of the ~
will affect the "shape" of your data.
Upvotes: 1