Reputation: 323
I have some data to reshape in R but can not figure out how. Here is the scenario: I have test scores data from a number of students from different schools. Here is some example data:
#Create example data:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
Resulting in a data format like this:
score schoolid
1 1
10 1
20 2
40 2
20 3
So, there is aschool id which identifies the school and there is a test score for each student. For an analysis in a different program, I would like to have the data in a format like this:
Score student 1 Score student 2
School ID == 1 1 10
School ID == 2 10 40
School ID == 3 20 NA
To reshape the data, I tried to use the reshape and the cast function from the reshape2 library, but this resulted in errors:
#Reshape function
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide")
reshape(test, idvar = test$schoolid, direction = "wide")
#Error: in [.data.frame'(data,,idvar): undefined columns selected
#Cast function
cast(test,test$schoolid~test$score)
#Error: Error: could not find function "cast" (although ?cast works fine)
I guess that the fact that there number of test scores is different for each school complicates the restructuring process.
How I can reshape this data and which function should I use ?
Upvotes: 2
Views: 4712
Reputation: 269654
Here are some solutions that only use the base of R. All three solutions use this new studentno
variable:
studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along))
1) tapply
with(test, tapply(score, list(schoolid, studentno), c))
giving:
1 2
1 1 10
2 20 40
3 20 NA
2) reshape
# rename score to student and append studentno column
test2 <- transform(test, student = score, score = NULL, studentno = studentno)
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno")
giving:
schoolid student.1 student.2
1 1 1 10
3 2 20 40
5 3 20 NA
3) xtabs xtabs
would also work if there are no students with a score of 0.
xt <- xtabs(score ~ schoolid + studentno, test)
xt[xt == 0] <- NA # omit this step if its ok to use 0 in place of NA
xt
giving:
studentno
schoolid 1 2
1 1 10
2 20 40
3 20
Upvotes: 5
Reputation: 11597
You have to define the student id somewhere, for example:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
test$studentid <- c(1,2,1,2,1)
library(reshape2)
dcast(test, schoolid ~ studentid, value.var="score",mean)
schoolid 1 2
1 1 1 10
2 2 20 40
3 3 20 NaN
Upvotes: 3