Bas
Bas

Reputation: 1076

R: if statements in loop

Basically a followup on this question.

I'm still trying to get a grasp of R's vectorising while trying to speed up a coworkers' code. I've read R inferno and Speed up the loop operation in R.

My aim is to speed up the following code, the complete dataset contains ~1000columns by 10.000-1.000.000 rows:

df3 <- structure(c("X", "X", "X", "X", "O", "O", "O", "O", "O", "O", 
"O", "O", "O", "O", "O", "O"), .Dim = c(2L, 8L), .Dimnames = list(
    c("1", "2"), c("pig_id", "code", "DSFASD32", "SDFSD56", 
    "SDFASD12", "SDFSD56342", "SDFASD12231", "SDFASD45442"
    )))

score_1 <- structure(c(0, 0, 0, 0, 0, 0), .Dim = 2:3)


for (i in 1:nrow(df3)) {
  a<-matrix(table(df3[i,3:ncol(df3)]))

  if (nrow(a)==1) {
    score_1[i,1]<-0    #count number of X (error), N (not compared) and O (ok)
    score_1[i,2]<-a[1,1]
  }
  if (nrow(a)==2) {
    score_1[i,1]<-a[1,1]
    score_1[i,2]<-a[2,1]
  }
  if (nrow(a)==3) {
    score_1[i,1]<-a[1,1]
    score_1[i,2]<-a[2,1]
    score_1[i,3]<-a[3,1]
  }                        
}
colnames(score_1) <- c("N", "O", "X")

I have been trying myself but can't seem to figure it out yet. Here is what I've tried. It shows the same output as the code above, but I'm not sure if it actually does the same. I'm missing that bit of insight in R and my data set.

I can't seem to get my code get the same output as the for loop.


Edit: In response to Heroka's response I've updated my reproducible example:

Output of the for loop:

     [,1] [,2] [,3]
[1,]    0    6    0
[2,]    0    6    0

output of the apply function:

     1 2
[1,] 6 6

Upvotes: 0

Views: 135

Answers (1)

Heroka
Heroka

Reputation: 13139

This gives you the desired result in the table due to a conversion to a factor (forcing other letters to be zero), but is less computationally efficient than just using apply and table.

res <- t(apply(df3[,-c(1:2)],1,function(x){
  x_f=factor(x, levels=c("N","O","X"))
  return(table(x_f))
}))

> res
  N O X
1 0 6 0
2 0 6 0

For a smaller dataset melting the data first might be an option, but with 1e6 rows and 100 columns you'd need a lot of memory.

Upvotes: 2

Related Questions