hedgedandlevered
hedgedandlevered

Reputation: 2394

performance of .Primitive and .Internal

I was doing some optimization by removing one step from the process:

> library(microbenchmark)
> microbenchmark(paste0("this","and","that"))
Unit: microseconds
                          expr   min    lq    mean median    uq    max neval
 paste0("this", "and", "that") 2.026 2.027 3.50933  2.431 2.837 34.038   100

> microbenchmark(.Internal(paste0(list("this","and","that"),NULL)))
Unit: microseconds
                                                 expr   min    lq    mean median    uq    max neval
 .Internal(paste0(list("this", "and", "that"), NULL)) 1.216 1.621 2.77596  2.026 2.027 43.764   100

So far so good.

But then after I noticed that list was defined as

function (...)  .Primitive("list")

I tried to further "simplify"

> microbenchmark(.Internal(paste0(.Primitive("list")("this","and","that"),NULL)))
Unit: microseconds
                                                               expr   min    lq    mean median    uq    max neval
 .Internal(paste0(.Primitive("list")("this", "and", "that"), NULL)) 3.241 3.242 4.66433  3.647 3.648 80.638   100

and the time increases!

my guess is that processing the string "list" is the source of the problem, and that it's handled differently within the actual calling of the function list

but how?

disclaimer: I know this hurts readability more than it helps performance. This is just for some very simple functions that will not change and are used so often that slight performance issues are desired even at this cost.


Edit in response to Josh O'Brien's comment:

I'm not sure what this says about his idea, but

library(compiler)
ff <- compile(function(...){.Internal(paste0(.Primitive("list")("this","and","that"),NULL))})
ff2 <- compile(function(...){.Internal(paste0(list("this","and","that"),NULL))})
microbenchmark(eval(ff),eval(ff2),times=10000)
> microbenchmark(eval(ff2),eval(ff),times=10000)
Unit: microseconds
      expr   min    lq     mean median    uq     max neval
 eval(ff2) 1.621 2.026 2.356761  2.026 2.431 144.257 10000
  eval(ff) 1.621 2.026 2.455913  2.026 2.431  89.148 10000

and looking at the plot generated from microbenchmark (just wrap it with plot() to see it yourself) running that a bunch of times, it appears that those have statistically identical performance, despite that "max" value looking like ff2 has a worse worst-case. I don't know what to make of that, but maybe that will help someone. So all that basically says that they compile to identical code. Does that mean his comment is the answer?

Upvotes: 14

Views: 400

Answers (2)

Neal Fultz
Neal Fultz

Reputation: 9696

The R interpreter has hardcoded optimizations for common functions, and this goes deeper than byte compiling:

> list2 <- list
> list3 <- cmpfun(list2)
> microbenchmark(
+   list(1,2),
+   list2(1,2),
+   list3(1,2)
+ )
Unit: nanoseconds
        expr min    lq    mean median    uq   max neval
  list(1, 2) 576 620.5  654.53  640.0 675.5   941   100
 list2(1, 2) 619 702.0 1123.43  728.0 761.0 39045   100
 list3(1, 2) 617 683.0  735.83  715.5 759.0  1964   100

Here's what the SEXPs look like. Note the metadata on "list"

> .Internal(inspect(quote(list(1,2))))
@23b0ed0 06 LANGSXP g0c0 [NAM(2)] 
  @1ed8f48 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "list" (has value)
  @2c7adf8 14 REALSXP g0c1 [] (len=1, tl=0) 1
  @2c7adc8 14 REALSXP g0c1 [] (len=1, tl=0) 2

list2 is missing some metadata:

> list2 <- list
> .Internal(inspect(quote(list2(1,2))))
@23b1578 06 LANGSXP g0c0 [NAM(2)] 
  @23b0a70 01 SYMSXP g0c0 [] "list2"
  @2c7ad08 14 REALSXP g0c1 [] (len=1, tl=0) 1
  @2c7acd8 14 REALSXP g0c1 [] (len=1, tl=0) 2

.Primitive("list") is a more complicated expression:

> .Internal(inspect(quote(.Primitive("list")(1,2))))
@297e748 06 LANGSXP g0c0 [NAM(2)] 
  @297d9a0 06 LANGSXP g0c0 [] 
    @1ec4530 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] ".Primitive" (has value)
    @2c7a888 16 STRSXP g0c1 [] (len=1, tl=0)
      @1ed5588 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "list"
  @2c7a858 14 REALSXP g0c1 [] (len=1, tl=0) 1
  @2c7a828 14 REALSXP g0c1 [] (len=1, tl=0) 2

Upvotes: 0

Joshua Ulrich
Joshua Ulrich

Reputation: 176718

The reason .Internal(paste0(.Primitive("list")("this","and","that"),NULL)) is slower seems to be because of what Josh O'Brien guessed. Calling .Primitive("list") directly incurs some additional overhead.

You can see the effects via a simple example:

require(compiler)
pl <- cmpfun({.Primitive("list")})
microbenchmark(list(), .Primitive("list")(), pl())
# Unit: nanoseconds
#                  expr  min     lq median     uq   max neval
#                list()   63   98.0  112.0  140.5   529   100
#  .Primitive("list")() 4243 4391.5 4486.5 4606.0 16077   100
#                  pl()   79  135.5  148.0  175.5 39108   100

That said, you're not going to be able to improve the speed of .Primitive and .Internal from the R prompt. They are both entry points to C code.

And there's no reason to try and replace a call to .Primitive with .Internal. That's recursive, since .Internal is itself a primitive.

> .Internal
function (call)  .Primitive(".Internal")

You'll get the same slowness if you try to call .Internal "directly"... and a similar "speedup" if you compile the "direct" call.

Internal. <- function() .Internal(paste0(list("this","and","that"),NULL))
Primitive. <- function() .Primitive(".Internal")(paste0("this","and","that"),NULL)
cPrimitive. <- cmpfun({Primitive.})
microbenchmark(Internal., Primitive., cPrimitive., times=1e4)
# Unit: nanoseconds
#         expr min lq median uq  max neval
#    Internal.  26 27     27 28 1057 10000
#   Primitive.  28 32     32 33 2526 10000
#  cPrimitive.  26 27     27 27 1706 10000

Upvotes: 11

Related Questions