Reputation: 126
I want to optimize my r function for calculating gini mean difference:
gini.md<- function(x)
{
n = length(x)
nm = n+1
x = sort(x)
return (2/n^2*sum((2*(1:n)-nm)*x))
}
Do you have any idea how to make it faster? Generating seqences with seq was slow. bitwShiftL((1:n), 1)
is slower than 2* (1:n)
. How is that possible?
Moreover I found out that mean(x)
is slower than sum(x)/length(x)
. Again why??? Mean is an internal function it should be faster.
Upvotes: 0
Views: 206
Reputation: 46856
Ignoring my own advice, I guessed that the most likely source of any speed problem is unnecessary creation of long vectors. The following C implementation avoids creating four vectors (1:n
, 2 * (1:n)
, 2 * (1:n) - nm
, and finally (2*(1:n)-nm)*x
).
library(inline)
gini <- cfunction(signature(x="REALSXP"), "
double n = Rf_length(x), nm = n + 1, ans = 0;
const double *xp = REAL(x);
for (int i = 0; i < n; ++i)
ans += (2 * (i + 1) - nm) * xp[i];
return ScalarReal(2 * ans / (n * n));
")
but this doesn't seem to help much. I realized after the fact that evaluation time is dominated by sort()
.
> library(microbenchmark)
> x <- rnorm(100000)
> all.equal(gini.md(x), gini(sort(x)))
[1] TRUE
> microbenchmark(gini.md(x), gini(sort(x)), sort(x), times=10)
Unit: milliseconds
expr min lq mean median uq max neval
gini.md(x) 10.668591 10.98063 11.09274 11.03377 11.20588 11.62714 10
gini(sort(x)) 10.439458 10.64972 10.78242 10.70099 10.93015 11.36177 10
sort(x) 9.995886 10.18180 10.31508 10.27024 10.46160 10.66006 10
Maybe there's more speed to be had, but it will be similarly marginal.
Upvotes: 1