user1876508
user1876508

Reputation: 13172

Speed up text processing in R

I am working on optimizing a piece of software and the most expensive lines are the text processing lines of code. By taking apart the program and commenting certain sections out, I have found out that a little argument in an if-statement is causing most of the bottleneck in the program. In the statement it asks if

allele1 %in% rownames(seqMat)

is true, and if so the statements that follow will be called. This if statement is looped thousands of times and causes the program to slow down significantly. My question is, how can that statement be changed to help speed up the program?

Upvotes: 0

Views: 186

Answers (1)

flodel
flodel

Reputation: 89097

You could call %in% only once for all your alleles and store its output for reuse inside the loop. Here is a proof of concept:

a <- sample(1:1000, 100000, replace = TRUE)
b <- -1000:1000

system.time({
    stored <- a %in% b
    for (i in seq_along(a))
        stored[i]
}) 
#    user  system elapsed 
#   0.056   0.001   0.056 

system.time({
    for (i in seq_along(a))
        a[i] %in% b
})
#    user  system elapsed 
#   3.634   0.374   3.957

Also, Hadley's suggestion of using any and == is not that big of an improvement:

system.time({
    for (i in seq_along(a))
        any(a[i] == b)
})
#    user  system elapsed 
#   1.661   0.164   1.835 

Upvotes: 4

Related Questions