Reputation: 1227
Hadley's new pryr package that shows the address of a variable is really great for profiling. I have found that whenever a variable is passed into a function, no matter what that function does, a copy of that variable is created. Furthermore, if the body of the function passes the variable into another function, another copy is generated. Here is a clear example
n = 100000
p = 100
bar = function(X) {
print(pryr::address(X))
}
foo = function(X) {
print(pryr::address(X))
bar(X)
}
X = matrix(rnorm(n*p), n, p)
print(pryr::address(X))
foo(X)
Which generates
> X = matrix(rnorm(n*p), n, p)
> print(pryr::address(X))
[1] "0x7f5f6ce0f010"
> foo(X)
[1] "0x92f6d70"
[1] "0x92f3650"
The address changes each time, despite the functions not doing anything. I'm confused by this behavior because I've heard R described as copy on write - so variables can be passed around but copies are only generated when a function wants to write into that variable. What is going on in these function calls?
For best R development is it better to not write multiple small functions, rather keep the content all in one function? I have also found some discussion on Reference Classes, but I see very little R developers using this. Is there another efficient way to pass the variable that I am missing?
Upvotes: 3
Views: 311
Reputation: 6815
You can use the profmem package (I'm the author), to see what memory allocations take place. It requires that your R session was build with "profmem" capabilities:
capabilities()["profmem"]
## profmem
## TRUE
Then, you can do something like this:
n <- 100000
p <- 100
X <- matrix(rnorm(n*p), nrow = n, ncol = p)
object.size(X)
## 80000200 bytes
## No copies / no new objects
bar <- function(X) X
foo <- function(X) bar(X)
## One new object
bar2 <- function(X) 2*X
foo2 <- function(X) bar2(X)
profmem::profmem(foo(X))
## Rprofmem memory profiling of:
## foo(X)
##
## Memory allocations:
## bytes calls
## total 0
profmem::profmem(foo2(X))
## Rprofmem memory profiling of:
## foo2(X)
##
## Memory allocations:
## bytes calls
## 1 80000040 foo2() -> bar2()
## total 80000040
Upvotes: 3
Reputation: 17348
I'm not entirely certain, but address may point to the memory address of the pointer to the object. Take the following example.
library(pryr)
n <- 100000
p <- 500
X <- matrix(rep(1,n*p), n, p)
l <- list()
for(i in 1:10000) l[[i]] <- X
At this point, if each element of l
was a copy of X
, the size of l
would be ~3.5Tb. Obviously this is not the case as your computer would have started smoking. And yet the addresses are different.
sapply(l[1:10], function(x) address(x))
# [1] "0x1062c14e0" "0x1062c0f10" "0x1062bebc8" "0x10641e790" "0x10641dc28" "0x10641c640" "0x10641a800" "0x1064199c0"
# [9] "0x106417380" "0x106411d40"
Upvotes: 5
Reputation: 13122
pryr::address
passes an unevaluated symbol to an internal function that returns its address in the parent.frame()
:
pryr::address
#function (x)
#{
# address2(check_name(substitute(x)), parent.frame())
#}
#<environment: namespace:pryr>
Wrapping of the above function can lead to returning address of a "promise". To illustrate we can simulate pryr::address
's functionality as:
ff = inline::cfunction(sig = c(x = "symbol", env = "environment"), body = '
SEXP xx = findVar(x, env);
Rprintf("%s at %p\\n", type2char(TYPEOF(xx)), xx);
if(TYPEOF(xx) == PROMSXP) {
SEXP pr = eval(PRCODE(xx), PRENV(xx));
Rprintf("\tvalue: %s at %p\\n", type2char(TYPEOF(pr)), pr);
}
return(R_NilValue);
')
wrap1 = function(x) ff(substitute(x), parent.frame())
where wrap1
is an equivalent of pryr::address
.
Now:
x = 1:5
.Internal(inspect(x))
#@256ba60 13 INTSXP g0c3 [NAM(1)] (len=5, tl=0) 1,2,3,4,5
pryr::address(x)
#[1] "0x256ba60"
wrap1(x)
#integer at 0x0256ba60
#NULL
with further wrapping, we can see that a "promise" object is being constructed while the value is not copied:
wrap2 = function(x) wrap1(x)
wrap2(x)
#promise at 0x0793f1d4
# value: integer at 0x0256ba60
#NULL
wrap2(x)
#promise at 0x0793edc8
# value: integer at 0x0256ba60
#NULL
# wrap 'pryr::address' like your 'bar'
( function(x) pryr::address(x) )(x)
#[1] "0x7978a64"
( function(x) pryr::address(x) )(x)
#[1] "0x79797b8"
Upvotes: 4