Reputation: 1033
This is really basic, but I am getting stuck with overly complicated code. I have a CSV file with a column of tests, a column of marks, and a column of students. I would like to reformat the data such that I have rows of student marks and columns of the tests.
I created a separate csv that contains the students (as number codes) called "students.csv" as this was easier for now.
I have 52 students and 50 tests.
I can get the following to work with a single student:
matricNumbers <- read.csv("students.csv")
students <- as.vector(as.matrix(matricNumbers))
students
data <- read.csv("marks.csv")
studentSubset <- data[data[2] == 1150761,]
marksSubset <- as.vector(as.matrix(studentSubset[5]))
ll <- list()
ll<-c(list(marksSubset), ll)
dd<-data.frame(matrix(nrow=50,ncol=50))
for(i in 1:length(ll)){
dd[i,] <- ll[[i]]
}
dd
but I can't seem to get this to work with a for
loop to go through every student.
getMarks <-function(studentNumFile,markFile){
matricNumbers <- read.csv(studentNumFile)
students <- as.vector(as.matrix(matricNumbers))
data <- read.csv(markFile)
for (i in seq_along(students)){
studentSubset <- data[data[2] == i,]
marksSubset <- as.vector(as.matrix(studentSubset[5]))
ll <- list()
ll<-c(list(marksSubset), ll)
dd<-data.frame(matrix(nrow=52,ncol=50))
for(i in 1:length(ll)){
dd[i,] <- ll[[i]]
}
}
return(dd)
}
getMarks("students.csv","marks.csv")
I am getting the error:
Error in `[<-.data.frame`(`*tmp*`, i, , value = logical(0)) : replacement has 0 items, need 50
I am sure this is due to the nested for
loop but I can't figure out how to do this otherwise.
Upvotes: 0
Views: 215
Reputation: 59980
You can use the reshape
package to achieve what you want if I understand the problem correctly. As you don't provide sample data it is hard to test. I advise you paste the output of dput( head( matricNumbers ) )
into a code block above for this purpose.
However, you should be able to follow this simple example that I use with some dummy data. I think you may only need one line, and you can forget all the complicated loop stuff!
# These lines make some dummy data, similar to you matricNumbers (hopefully)
test = sort(sample(c("Biology","Maths","Chemistry") , 10 , repl = TRUE ))
students = unlist( sapply(table(test), function(x) { sample( letters[1:x] , x ) } ) )
names(students) <- NULL
scores <- data.frame( test , mark = sample( 40:100 , 10 , repl = TRUE ) , students )
scores
test mark students
1 Biology 50 c
2 Biology 93 a
3 Biology 83 b
4 Biology 83 d
5 Chemistry 71 b
6 Chemistry 54 c
7 Chemistry 54 a
8 Maths 97 c
9 Maths 93 b
10 Maths 72 a
# Then use reshape to cast your data into the format you require
# I use 'mean' as the aggregation function. If you have one score for each student/test, then mean will just return the score
# If you do not have a score for a particular student in that test then it will return NaN
require( reshape )
bystudent <- cast( scores , students ~ test , value = "mark" , mean )
bystudent
students Biology Chemistry Maths
1 a 93 54 72
2 b 83 71 93
3 c 50 54 97
4 d 83 NaN NaN
Upvotes: 1