user3469517
user3469517

Reputation: 126

Optimize r code

I want to optimize my r function for calculating gini mean difference:

gini.md<- function(x)
{
  n  = length(x)
  nm = n+1
  x = sort(x)
  return (2/n^2*sum((2*(1:n)-nm)*x))
}

Do you have any idea how to make it faster? Generating seqences with seq was slow. bitwShiftL((1:n), 1) is slower than 2* (1:n). How is that possible?

Moreover I found out that mean(x) is slower than sum(x)/length(x). Again why??? Mean is an internal function it should be faster.

Upvotes: 0

Views: 206

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46856

Ignoring my own advice, I guessed that the most likely source of any speed problem is unnecessary creation of long vectors. The following C implementation avoids creating four vectors (1:n, 2 * (1:n), 2 * (1:n) - nm, and finally (2*(1:n)-nm)*x).

library(inline)
gini <- cfunction(signature(x="REALSXP"), "
    double n = Rf_length(x), nm = n + 1, ans = 0;
    const double *xp = REAL(x);
    for  (int i = 0; i < n; ++i)
        ans += (2 * (i + 1) - nm) * xp[i];
    return ScalarReal(2 * ans / (n * n));
")

but this doesn't seem to help much. I realized after the fact that evaluation time is dominated by sort().

> library(microbenchmark)
> x <- rnorm(100000)
> all.equal(gini.md(x), gini(sort(x)))
[1] TRUE
> microbenchmark(gini.md(x), gini(sort(x)), sort(x), times=10)
Unit: milliseconds
          expr       min       lq     mean   median       uq      max neval
    gini.md(x) 10.668591 10.98063 11.09274 11.03377 11.20588 11.62714    10
 gini(sort(x)) 10.439458 10.64972 10.78242 10.70099 10.93015 11.36177    10
       sort(x)  9.995886 10.18180 10.31508 10.27024 10.46160 10.66006    10

Maybe there's more speed to be had, but it will be similarly marginal.

Upvotes: 1

Related Questions