Reputation: 652
I'm getting behaviour I don't understand when saving environments. The code below demonstrates the problem. I would have expected the two files (far-too-big.RData
, and right-size.RData
) to be the same size, and also very small because the environments they contain are empty.
In fact, far-too-big.RData
ends up the same size as bigfile.RData
.
I get the same results using 2.14.1 and 2.15.2, both on WinXP 5.1 SP3. Can anyone explain why this is happening?
Both far-too-big.RData
and right-size.RData
, when loaded into a new R session, appear to contain nothing. ie they return character(0)
in response to ls()
. However, if I switch the saves to include ascii=TRUE
, and open the result in a text editor, I can see that far-too-big.RData
contains the data in bigfile.RData
.
a <- matrix(runif(1000000, 0, 1), ncol=1000)
save(a, file="bigfile.RData")
fn <- function() {
load("bigfile.RData")
test <- new.env()
save(test, file="far-too-big.RData")
test1 <- new.env(parent=globalenv())
save(test1, file="right-size.RData")
}
fn()
Upvotes: 20
Views: 4274
Reputation: 14872
This is not my area of expertise but I belive environments work like this.
The result of the above in your case is:
fn()
it creates its own local environment (green), whose parent by default is globalenv()
(grey).test
(red) inside fn()
its parent defaults to fn()
's environment (green). test
will therefore include the object a
.test1
(blue) and explicitly states that its parent is globalenv()
it is separated from fn()
's environment and does not inherit the object a
.So when saving test
you also save a (somewhat hidden) copy of the object a
. This does not happen when you save test1
as it does not include the object a
.
Apparently this is a more complicated topic than I used to believe. Although I might just be quoting @joris-mays answer now I'd like to take a final go at it.
To me the most intuitive visualization of environments would be a tree structure, see below, where each node is an environment and the arrows point to its respective enclosing environment (which I would like to believe is the same as its parent, but that has to do with frames and is beyond my corner of the world). A given environment encloses all objects you can reach by moving down the tree and it can access all objects you can reach by moving up the tree. When you save an environment it appears you save all objects and environments that are both enclosed by it and accessible from it (with the exception of globalenv()
).
However, the take home message is as Joris already stated: save your objects as lists and you don't need to worry.
If you want to know more I can recommend Norman Matloff's excellent book the art of R programming. It is aimed at software development in R rather than primary data analysis and assumes you have a fair bit of programming experience. I must admit I haven't fully digested the environment part yet, but as the rest of the book is very well written and pedagogical I assume this one is too.
Upvotes: 17
Reputation: 108583
Actually, it's the other way around than @Backlin shows: the parent environment is the one that encloses the other ones. So in the case you define, the enclosing environment of test
is the local environment of fn
, and the enclosing environment of test1
is the global environment, like this:
Environments behave different from other objects in R, in the sense that they don't get copied when passed to functions or used in assignments. The environment object itself consists internally of pointers to :
The fact that an environment contains pointers, makes all the difference. Environments are not all that easy to deal with, they're actually very tricky. Take a look at the code below :
> test <- new.env()
> test$a <- 1
> test2 <- test
> test2$a <- 2
> test$a
[1] 2
So the only thing you copied from test
in test2
, is the pointers. If you change a value in test2
, you change that in test
as well. (Actually, you change that value only once, but test
and test2
point both to the same frame).
When you try to save an environment, R has no choice but to get the values for the frame, the hash table AND the enclosing environment and save those. As the enclosing environment is an environment in itself, R will also save all enclosing environments until it reaches the global environment. As the global environment is treated in a special way in the internal code, that one is (luckily) not saved in the file.
Note the difference between an enclosing environment and a parent frame: Say we define our functions a bit different :
a <- matrix(runif(1000000, 0, 1), ncol=1000)
save(a, file="bigfile.RData")
fn <- function() {
load("bigfile.RData")
test <- new.env()
save(test, file="far-too-big.RData")
test1 <- new.env(parent=globalenv())
save(test1, file="right-size.RData")
}
fn2 <- function(){
z <- matrix(runif(1000000,0,1),ncol=1000)
fn()
}
fn2()
Now we have the following situation :
One would think that the file "far-too-big.RData" contains both matrix a and matrix z, but that's not the case. It contains only the matrix a. This is because the enclosing environment of fn
is the global environment. The parent frame of fn
is the environment of fn2
, but the environment object created by fn
contains a pointer to the global environment.
On the other hand, if we do the following:
fn <- function() {
load("bigfile.RData")
test <- new.env()
test$b <- a
test2 <- new.env(parent=test)
save(test2, file="far-too-big.RData")
}
test2
is now enclosed in two environments (being test
and the environment of fun
), and both environments are saved in the file as well. So you get this situation :
Regardless of this, I personally avoid saving environments as environments, because there are more things that can go wrong. In my opinion, saving an environment as a list is in 99.9% of the cases the better choice :
fn2 <- function(){
load("bigfile.RData")
test <- new.env()
test$x <- "something"
test$fn <- ls
testlist <- as.list(test)
save(testlist, file="right-size.RData")
}
fn2()
If you need it to be an environment, you can convert it back when loading.
load("right-size.RData")
test <- as.environment(testlist)
Upvotes: 9