Reputation: 105
I have a function that I will be calling heavily (roughly 10^11 times per iteration in an optimisation, for a few different experiments). I have implemented a fast version, but I am struggling to see how I can improve performance. The 'system' time is minimal, and the user time is high.
Here's the code, it takes in an integer and returns a vector that represents that integer is a different base counting system (eg base = 2 gives binary, base = 10 gives the standard result back). The parameter k gives the length of the vector to return, so there may be many zeros at the front.
As you'll see, the functions take 5 or 7 seconds to run, but none of it is system time. I would like to understand why, and if there are ways to speed it up. I have this same problem with other functions (99% of the time is spend in one function in a loop, but speeding it up 200fold only halved the run-time), but am showing this one for clarity.
library(Rcpp)
library(microbenchmark)
# Rcpp version of the function
cppFunction('
NumericVector convert10tobase_c(double number_b10, int k, int base = 4){
if(number_b10 >= pow(base, k)){
stop("k is not large enough to contain the number in this base");
}
NumericVector ret(k);
if(k == 1){
return number_b10;
}
for (int i = 1 ;i < k; i++){
double entry = floor(number_b10 / pow(base, (k - i)));
ret[i-1] = entry;
number_b10 = number_b10 - entry * pow(base, (k - i));
}
ret[k-1] = number_b10;
return ret;
}')
# R version of the function
convert10tobase <- function(number_b10, k, base = 5){
if(number_b10 >= base ^ k){
stop("k is not large enough to contain the number in this base")
}
ret <- rep(0, k)
if(k == 1){
return(number_b10)
}
for (i in 1:(k - 1)){
entry <- floor(number_b10 / base^(k-i))
ret[i] <- entry
number_b10 <- number_b10 - entry * (base ^ (k - i))
}
ret[k] <- number_b10
return(ret)
}
# generate test data one hundred, thousand and million integers
set.seed(1)
base <- 4
k <- 8
ints_short <- floor(runif(1e2) * (base^k))
ints_long <- floor(runif(1e4) * (base^k))
ints_v_long <- floor(runif(1e6) * (base^k))
# benchmark the Rcpp version
microbenchmark(
one = convert10tobase_c(ints_short[1], k, base),
hundred = sapply(1:length(ints_short), function(i) convert10tobase_c(ints_short[i], k, base)),
ten_thous = sapply(1:length(ints_long), function(i) convert10tobase_c(ints_long[i], k, base)),
times = 100)
# test R and Rcpp times
r_start <- proc.time()
t <- sapply(1:length(ints_v_long), function(i) convert10tobase(ints_v_long[i], k, base))
r_stop <- proc.time()
c_start <- proc.time()
t <- sapply(1:length(ints_v_long), function(i) convert10tobase_c(ints_v_long[i], k, base))
c_stop <- proc.time()
# results - little time in 'system'
r_stop - r_start
c_stop - c_start
As an aside, I included a comparison of calling the function once, a hundred times and a hundred thousand times. The time for one hundred calls was 300 times slower than for one, while for ten thousand calls was thirty times slower than one hundred calls. I would like to understand why, and would appreciate any resources that could explain it.
Thanks!
Upvotes: 2
Views: 314
Reputation: 13139
R is very good at efficiently doing the same thing to multiple similar things. Therefore, your code gets more efficient if you group similar things together before doing something. This can be a bit tricky to start with, especially when you've arrived from another coding background.
Here is a solution where your function is vectorized within R (unsure how this relates to C++ loop, probably something internal). It can probably be optimized further, but it's 100x faster than using sapply
for each individual number. It returns a matrix with one row per number fed to it, and a column for each entry. When a number is larger than base ^k
, a row of NA's is returned. In further work with the output, this row can then easily be identified.
convert10tobase_v <- function(number_b10, k, base = 5){
orig_b10 <- number_b10 #save original for check after
if(k == 1){
return(number_b10)
}
#initialize matrix to store results
ret <- matrix(0, ncol=k, nrow=length(number_b10))
#tiny-forloop, won't influenc performance and makes
#storing results/modifying number_b10 easier
for (i in 1:(k - 1)){
entry <- floor(number_b10 / base^(k-i))
ret[,i] <- entry
number_b10 <- number_b10 - entry * (base ^ (k - i))
}
ret[,k] <- number_b10
ret[orig_b10 >= base ^ k,] <- NA #set 'too large' numbers to missing
return(ret)
}
Microbenchmark:
Unit: microseconds
expr min lq mean median uq max neval cld
one_single 20.216 25.1910 31.94323 29.079 37.6310 58.469 100 a
hundred_singles 2217.461 2317.9145 2499.23338 2386.336 2498.4525 4436.476 100 b
ten_thous_singles 240467.874 246613.1635 253205.12598 249890.060 252432.2090 307050.155 100 c
one_v 22.703 26.5910 33.09706 30.323 36.3875 62.823 100 a
hundred_v 53.181 56.9135 68.05703 61.889 75.5740 129.066 100 a
ten_thous_v 2641.359 2707.2920 2806.83843 2744.613 2827.9620 4645.160 100 b
Upvotes: 5