windy
windy

Reputation: 155

Change data from wide to long format in r

I have data where students are rated by two raters each on multiple questions. Each row contains these variables:

....and then it repeats for multiple items.

It looks something like this:

Student_ID  <- c(1:4)
Item1_first_rater_id <- c(1,2,1,2)
Item1_first_rating <- c(2,3,4,2)
Item1_second_rater_id <- c(2,3,2,3)
Item1_second_rating <- c(4,5,3,2)
Item2_first_rater_id <- c(4,2,5,1)
Item2_first_rating <- c(2,3,4,2)
Item2_second_rater_id <- c(6,7,2,3)
Item2_second_rating <- c(3,4,5,4)

wide <- data.frame(Student_ID, Item1_first_rater_id, Item1_first_rating, 
                          Item1_second_rater_id, Item1_second_rating, 
                          Item2_first_rater_id, Item2_first_rating, 
                          Item2_second_rater_id, Item2_second_rating)

I need the data to be in a long format like this:

Student_ID  <- c(1:4)
Item_number <- c(1,1,2,2)
Rater_id <- c(1:4)
Score <- c(2,3,4,5)
long <- data.frame(Student_ID, Item_number, Rater_id, Score)

Any ideas about how to reshape?

Thanks.

Upvotes: 0

Views: 911

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193687

It isn't totally clear what you're trying to do (in other words, how exactly you want to transform your source data). Here is one guess that might at least get you closer to your desired output.

It seems like the names in your "wide" dataset contain three sets of information: (1) an item number, (2) a "time" (first or second), and (3) another variable (either "rating" or "rater id").

We can use melt, colsplit, and dcast to facilitate our reshaping.

Step 1: melt the dataset

library(reshape2)
orignames <- names(wide) # Store the original names so we can replace them
names(wide) <- gsub("Item([0-9])_(.*)_(rater_id|rating)", 
                    "\\1\\.\\2\\.\\3", names(wide))
# "melt" the dataset
m.wide <- melt(wide, id.vars="Student_ID")
head(m.wide)
#   Student_ID         variable value
# 1          1 1.first.rater_id     1
# 2          2 1.first.rater_id     2
# 3          3 1.first.rater_id     1
# 4          4 1.first.rater_id     2
# 5          1   1.first.rating     2
# 6          2   1.first.rating     3

Step 2: Create the new columns using colsplit

m.wide <- cbind(m.wide, 
                colsplit(m.wide$variable, "\\.", 
                         c("Item", "Time", "Var")))
head(m.wide)
#   Student_ID         variable value Item  Time      Var
# 1          1 1.first.rater_id     1    1 first rater_id
# 2          2 1.first.rater_id     2    1 first rater_id
# 3          3 1.first.rater_id     1    1 first rater_id
# 4          4 1.first.rater_id     2    1 first rater_id
# 5          1   1.first.rating     2    1 first   rating
# 6          2   1.first.rating     3    1 first   rating

Step 3: Use dcast to reshape the data

dcast(m.wide, Student_ID + Item ~ Time + Var, value.var="value")
#   Student_ID Item first_rater_id first_rating second_rater_id second_rating
# 1          1    1              1            2               2             4
# 2          1    2              4            2               6             3
# 3          2    1              2            3               3             5
# 4          2    2              2            3               7             4
# 5          3    1              1            4               2             3
# 6          3    2              5            4               2             5
# 7          4    1              2            2               3             2
# 8          4    2              1            2               3             4

Switching what's to the left and what's to the right of the ~ will affect the "shape" of your data.

Upvotes: 1

Related Questions