performance of .Primitive and .Internal

Question

I was doing some optimization by removing one step from the process:

> library(microbenchmark)
> microbenchmark(paste0("this","and","that"))
Unit: microseconds
                          expr   min    lq    mean median    uq    max neval
 paste0("this", "and", "that") 2.026 2.027 3.50933  2.431 2.837 34.038   100

> microbenchmark(.Internal(paste0(list("this","and","that"),NULL)))
Unit: microseconds
                                                 expr   min    lq    mean median    uq    max neval
 .Internal(paste0(list("this", "and", "that"), NULL)) 1.216 1.621 2.77596  2.026 2.027 43.764   100

So far so good.

But then after I noticed that list was defined as

function (...)  .Primitive("list")

I tried to further "simplify"

> microbenchmark(.Internal(paste0(.Primitive("list")("this","and","that"),NULL)))
Unit: microseconds
                                                               expr   min    lq    mean median    uq    max neval
 .Internal(paste0(.Primitive("list")("this", "and", "that"), NULL)) 3.241 3.242 4.66433  3.647 3.648 80.638   100

and the time increases!

my guess is that processing the string "list" is the source of the problem, and that it's handled differently within the actual calling of the function list

but how?

disclaimer: I know this hurts readability more than it helps performance. This is just for some very simple functions that will not change and are used so often that slight performance issues are desired even at this cost.

Edit in response to Josh O'Brien's comment:

I'm not sure what this says about his idea, but

library(compiler)
ff <- compile(function(...){.Internal(paste0(.Primitive("list")("this","and","that"),NULL))})
ff2 <- compile(function(...){.Internal(paste0(list("this","and","that"),NULL))})
microbenchmark(eval(ff),eval(ff2),times=10000)
> microbenchmark(eval(ff2),eval(ff),times=10000)
Unit: microseconds
      expr   min    lq     mean median    uq     max neval
 eval(ff2) 1.621 2.026 2.356761  2.026 2.431 144.257 10000
  eval(ff) 1.621 2.026 2.455913  2.026 2.431  89.148 10000

and looking at the plot generated from microbenchmark (just wrap it with plot() to see it yourself) running that a bunch of times, it appears that those have statistically identical performance, despite that "max" value looking like ff2 has a worse worst-case. I don't know what to make of that, but maybe that will help someone. So all that basically says that they compile to identical code. Does that mean his comment is the answer?

Joshua Ulrich · Accepted Answer

The reason .Internal(paste0(.Primitive("list")("this","and","that"),NULL)) is slower seems to be because of what Josh O'Brien guessed. Calling .Primitive("list") directly incurs some additional overhead.

You can see the effects via a simple example:

require(compiler)
pl <- cmpfun({.Primitive("list")})
microbenchmark(list(), .Primitive("list")(), pl())
# Unit: nanoseconds
#                  expr  min     lq median     uq   max neval
#                list()   63   98.0  112.0  140.5   529   100
#  .Primitive("list")() 4243 4391.5 4486.5 4606.0 16077   100
#                  pl()   79  135.5  148.0  175.5 39108   100

That said, you're not going to be able to improve the speed of .Primitive and .Internal from the R prompt. They are both entry points to C code.

And there's no reason to try and replace a call to .Primitive with .Internal. That's recursive, since .Internal is itself a primitive.

> .Internal
function (call)  .Primitive(".Internal")

You'll get the same slowness if you try to call .Internal "directly"... and a similar "speedup" if you compile the "direct" call.

Internal. <- function() .Internal(paste0(list("this","and","that"),NULL))
Primitive. <- function() .Primitive(".Internal")(paste0("this","and","that"),NULL)
cPrimitive. <- cmpfun({Primitive.})
microbenchmark(Internal., Primitive., cPrimitive., times=1e4)
# Unit: nanoseconds
#         expr min lq median uq  max neval
#    Internal.  26 27     27 28 1057 10000
#   Primitive.  28 32     32 33 2526 10000
#  cPrimitive.  26 27     27 27 1706 10000

performance of .Primitive and .Internal

Answers (2)

Related Questions