Reputation: 421
the following code
require(R6)
Array <- R6Class(
"Array",
public=list(
x=matrix(0,0,0),
initialize=function(a,b,c){
self$x <- matrix(a,b,c)
},
assign=function(z){
self$x[1,1] <- z
invisible(self)
}
)
)
x <- Array$new(0,10,10)
tracemem(x$x)
x$assign(1)
y <- matrix(0,10,10)
tracemem(y)
y[1,1] <- 1
yields this output
> [1] "<0x55ae7be40040>"
> tracemem[0x55ae7be40040 -> 0x55ae7b403700]: <Anonymous>
> > [1] "<0x55ae7b254c90>"
> >
which implies that when the R6 member array is updated a copy of the array is made, whereas when the plain array is updated, the element is update "in-place" (i.e. no copy is generated).
Any idea how I can achieve this for the R6 object?
Upvotes: 2
Views: 65
Reputation: 4127
This happens because of the semantics for subset assignment in R. Subset assignment is the $<-
operator (as in x$y <- 1
) or the [<-
operator (as in x[y] <- 1
).
In R, when you do something like self$x <- y
, that actually gets turned into something like this:
`*tmp*` <- x
x <- "$<-"(`*tmp*`, y)
rm(`*tmp*`)
This creates *tmp*
, which initially points to the same object in memory as x
. However, when the assignment to x
happens in the second line, R makes a copy of the object and modifies it. This copy of the object needs to be GC'd (garbage collected) later, and that takes time.
See here for more info about subset assignment: https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment
(Note that in some cases, R knows that there is only a single reference to the object, and when that happens, it will modify the object in place. That is what happens with y
in your example above, but not with x
.)
One way to avoid this extra copying is to use the <<-
operator instead. First, here's a demonstration of using it without R6 to show the speed difference:
local({
self <- environment()
x <- 0
y <- 0
bench::mark(
self$x <- x+1,
y <<- y+1,
iterations = 1e5
)
})
#> # A tibble: 2 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#> 1 self$x <- x + 1 246ns 369ns 2405536. 0B 96.2 99996 4 41.57ms
#> 2 y <<- y + 1 0 41ns 16327450. 0B 0 100000 0 6.12ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
For the self$x <- x+1
case, notice the number of GC events, 4. And for the y <<- y+1
case, there are 0 GC events, and each iteration is about 9x faster on average.
The reason <<-
is so much faster here is because, when you do y <<- y+1
, it replaces y
directly in place, without making a copy.
In order to use <<-
with an R6 object, you need to set portable=FALSE
. Here's a modified version:
require(R6)
Array <- R6Class(
"Array",
portable=FALSE,
public=list(
x = matrix(0,0,0),
initialize=function(a,b,c){
x <<- matrix(a,b,c)
},
assign=function(z){
x[1,1] <<- z
invisible(self)
}
)
)
x <- Array$new(0,10,10)
tracemem(x$x)
x$assign(1)
y <- matrix(0,10,10)
tracemem(y)
y[1,1] <- 1
When I run it, the x$assign(1)
does not cause a tracemem message to be printed, which is the same as y$assign(1)
.
This will improve performance, but note that portable=FALSE
will cause problems if you try to use inheritance with this class across packages. (https://r6.r-lib.org/articles/Portable.html)
See my comments on this GitHub issue for a little more detail: https://github.com/r-lib/R6/issues/201#issuecomment-583486168
Upvotes: 1