Reputation: 13172
I am working on optimizing a piece of software and the most expensive lines are the text processing lines of code. By taking apart the program and commenting certain sections out, I have found out that a little argument in an if-statement is causing most of the bottleneck in the program. In the statement it asks if
allele1 %in% rownames(seqMat)
is true, and if so the statements that follow will be called. This if statement is looped thousands of times and causes the program to slow down significantly. My question is, how can that statement be changed to help speed up the program?
Upvotes: 0
Views: 186
Reputation: 89097
You could call %in%
only once for all your alleles and store its output for reuse inside the loop. Here is a proof of concept:
a <- sample(1:1000, 100000, replace = TRUE)
b <- -1000:1000
system.time({
stored <- a %in% b
for (i in seq_along(a))
stored[i]
})
# user system elapsed
# 0.056 0.001 0.056
system.time({
for (i in seq_along(a))
a[i] %in% b
})
# user system elapsed
# 3.634 0.374 3.957
Also, Hadley's suggestion of using any
and ==
is not that big of an improvement:
system.time({
for (i in seq_along(a))
any(a[i] == b)
})
# user system elapsed
# 1.661 0.164 1.835
Upvotes: 4