user1407220
user1407220

Reputation: 421

R6 array member, copy on update

the following code

require(R6)
Array <- R6Class(
  "Array",
  public=list(
    x=matrix(0,0,0),
    initialize=function(a,b,c){
      self$x <- matrix(a,b,c)
    },
    assign=function(z){
      self$x[1,1] <- z
      invisible(self)
    }
  )
)
x <- Array$new(0,10,10)
tracemem(x$x)
x$assign(1)
y <- matrix(0,10,10)
tracemem(y)
y[1,1] <- 1

yields this output

> [1] "<0x55ae7be40040>"
> tracemem[0x55ae7be40040 -> 0x55ae7b403700]: <Anonymous> 
> > [1] "<0x55ae7b254c90>"
> > 

which implies that when the R6 member array is updated a copy of the array is made, whereas when the plain array is updated, the element is update "in-place" (i.e. no copy is generated).

Any idea how I can achieve this for the R6 object?

Upvotes: 2

Views: 65

Answers (1)

wch
wch

Reputation: 4127

This happens because of the semantics for subset assignment in R. Subset assignment is the $<- operator (as in x$y <- 1) or the [<- operator (as in x[y] <- 1).

In R, when you do something like self$x <- y, that actually gets turned into something like this:

`*tmp*` <- x
x <- "$<-"(`*tmp*`, y)
rm(`*tmp*`)

This creates *tmp*, which initially points to the same object in memory as x. However, when the assignment to x happens in the second line, R makes a copy of the object and modifies it. This copy of the object needs to be GC'd (garbage collected) later, and that takes time.

See here for more info about subset assignment: https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment

(Note that in some cases, R knows that there is only a single reference to the object, and when that happens, it will modify the object in place. That is what happens with y in your example above, but not with x.)

One way to avoid this extra copying is to use the <<- operator instead. First, here's a demonstration of using it without R6 to show the speed difference:

local({
  self <- environment()
  x <- 0
  y <- 0
  bench::mark(
    self$x <- x+1,
    y <<- y+1,
    iterations = 1e5
  )
})
#> # A tibble: 2 × 13
#>   expression        min median `itr/sec` mem_alloc `gc/sec`  n_itr  n_gc total_time
#>   <bch:expr>      <bch> <bch:>     <dbl> <bch:byt>    <dbl>  <int> <dbl>   <bch:tm>
#> 1 self$x <- x + 1 246ns  369ns  2405536.        0B     96.2  99996     4    41.57ms
#> 2 y <<- y + 1         0   41ns 16327450.        0B      0   100000     0     6.12ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

For the self$x <- x+1 case, notice the number of GC events, 4. And for the y <<- y+1 case, there are 0 GC events, and each iteration is about 9x faster on average.

The reason <<- is so much faster here is because, when you do y <<- y+1, it replaces y directly in place, without making a copy.


In order to use <<- with an R6 object, you need to set portable=FALSE. Here's a modified version:

require(R6)
Array <- R6Class(
  "Array",
  portable=FALSE,
  public=list(
    x = matrix(0,0,0),
    initialize=function(a,b,c){
      x <<- matrix(a,b,c)
    },
    assign=function(z){
      x[1,1] <<- z
      invisible(self)
    }
  )
)
x <- Array$new(0,10,10)
tracemem(x$x)
x$assign(1)
y <- matrix(0,10,10)
tracemem(y)
y[1,1] <- 1

When I run it, the x$assign(1) does not cause a tracemem message to be printed, which is the same as y$assign(1).

This will improve performance, but note that portable=FALSE will cause problems if you try to use inheritance with this class across packages. (https://r6.r-lib.org/articles/Portable.html)

See my comments on this GitHub issue for a little more detail: https://github.com/r-lib/R6/issues/201#issuecomment-583486168

Upvotes: 1

Related Questions