Reputation: 1479
I am currently taking a class in R programming and was playing around with the following example function for caching the result of a potentially time consuming operation. More context on its purpose can be found on github:
makeVector <- function(x = numeric()) {
m <- NULL
set <- function(y) {
x <<- y
m <<- NULL
}
get <- function() {
return(x)
}
getmean <- function() {
if(!is.null(m)) {
m <<- mean(x)
}
return(m)
}
list(set = set, get = get,
getmean = getmean)
}
I stumbled upon the following strange behaviour:
When I use a temporary vector to initialize the cached vector and delete it before the cached vector object first accesses it, the cached vector forgets its contents. It throws an error when I try to access it using get()
(I print only the length for less output):
> v<-1:1e6
> cv<-makeVector(v)
> rm(v)
> length(cv$get())
Error in cv$get() : object 'v' not found
Watching the memory usage using library("pryr")
tells me that memory is allocated for the creation of the vector v
, but not for cv
(at least not enough to store v
in it). This amount is freed when rm(v)
is called. So there is really no accessible information on the vector's content anymore.
Now I do the same thing but I access the cached vector's contents before removing v
. This makes it accessible even after v
is removed.
> v<-1:1e6
> cv<-makeVector(v)
> length(cv$get())
[1] 1000000
> rm(v)
> length(cv$get())
[1] 1000000
Memory profiling suggests, that R does not copy the vector when accessing it (no significant increase in memory usage when calling length(cv$get())
). But the memory is not freed when removing v
. So R now has the awareness that the contents of the vector are still in use and the memory is not deallocated. Accessing it still works after v
is removed. Removing the cached vector (rm(cv)
) will free the memory.
Here is an example case where recycling temporary variables (which I might want to do in a loop) leads to wrong data being stored without a warning:
> v<-1:10
> cv1<-makeVector(v)
> v<-51:60
> cv2<-makeVector(v)
> cv1$get()
[1] 51 52 53 54 55 56 57 58 59 60
> cv2$get()
[1] 51 52 53 54 55 56 57 58 59 60
Note that I get the expected result when accessing the member variable before reinitializing v
:
> v<-1:10
> cv1<-makeVector(v)
> cv1$get()
[1] 1 2 3 4 5 6 7 8 9 10
> v<-51:60
> cv2<-makeVector(v)
> cv1$get()
[1] 1 2 3 4 5 6 7 8 9 10
> cv2$get()
[1] 51 52 53 54 55 56 57 58 59 60
Is this behaviour intended or is this a bug in R? Or is this simply undefined behaviour (because it is not mentioned in the standard) and this kind of caching is more like a dirty hack that I should never use?
For reference: I am using R version 3.1.1 on a 64bit Linux system.
Upvotes: 0
Views: 127
Reputation: 94222
Welcome to the wacky world of lazy evaluation and promises...
When you do:
cv<-makeVector(v)
your code never evaluates v
, it just defines some functions in the object that are going to use v
(now called x
as the argument name) later.
So then you rm(v)
, and call cv$get()
. Only at this point does x
get looked at, and oh dear, v
has gone.
The first time this hits you, it looks like a Heisenbug. If you put print(x)
after the function(x){
line, then x
prints out okay, and the function works. Get rid of the print(x)
and the function fails again. Observing the system seems to change the state of the system, just like Quantum Theory.
The explanation is that printing x
evaluates x
, and so now x
is no longer an unevaluated promise related to v
. You can safely remove v
.
The conventional way to work round this is to do something that evaluates x
in the outer function context. In the absence of any other way, use force(x)
, which is a null-op function that evaluates its args.
In your third code fragment, calling cv$get()
and then removing v
works because cv$get()
evaluates x
, finds v
, and all is well. x
is now evaluated.
That explains your first problem, I'm stopping there because I suspect everything else is related...
tl;dr: just stick force(x)
after your function definition
further reading: I suspect Hadley Wickham's Advanced R book explains this better.
Upvotes: 3