Iterator
Iterator

Reputation: 20570

Examining contents of .rdata file by attaching into a new environment - possible?

I am interested in listing objects in an RDATA file and loading only selected objects, rather than the whole set (in case some may be big or may already exist in the environment). I'm not quite clear on how to do this when there are conflicts in names, as attach() doesn't work as nicely.

1: For examining the contents of an R data file without loading it: This question is similar, but different from, the one asked at listing contents of an R data file without loading

In that case, the solution offered was:

attach(filename)
ls(pos = 2)
detach()

If there are naming conflicts between objects in the file and those in the global environment, this warning appears: The following object(s) are masked _by_ '.GlobalEnv':

I tried creating a new environment, but I cannot seem to attach into that. For instance, this produces the same error:

lsfile   <- function(filename){
  tmpEnv <- new.env()
  evalq(attach(filename), envir = tmpEnv)
  tmpls <- ls(pos = 2)
  detach()
  return(tmpls)
}
lsfile(filename)

Maybe I've made a mess of things with evalq (or eval). Is there some other way to avoid the naming conflict?

2: If I want to access an object - if there are no naming conflicts, I can just work with the one from the .rdat file, or copy it to a new one. If there are conflicts, how does one access the object in the file's namespace?

For instance, if my file is "sample.rdat", and the object is surveyData, and a surveyData object already exists in the global environment, then how can I access the one from the file:sample.rdat namespace?

I currently solve this problem by loading everything into a temporary environment, and then copy out what's needed, but this is inefficient.

Upvotes: 12

Views: 6256

Answers (4)

Simon Urbanek
Simon Urbanek

Reputation: 13942

Since this question has just been referenced let's clarify two things:

  1. attach() simply calls load() so there is really no point in using it instead of load

  2. if you want selective access to prevent masking it's much easier to simply load the file into a new environment:

    e = local({load("foo.RData"); environment()})
    

    You can then use ls(e) and access contents like e$x. You can still use attach on the environment if you really want it on the search path.

FWIW .RData files have no index (the objects are stored in one big pairlist), so you can't list the contained objects without loading. If you want convenient access, convert it to the lazy-load format instead which simply adds an index so each object can be loaded separately (see Get specific object from Rdata file)

Upvotes: 24

Iterator
Iterator

Reputation: 20570

Thanks to @Dirk and @Joshua.

I had an epiphany. The command/package foreach with SMP or MC seems to produce environments that only inherit, but do not seem to conflict with, the global environment.

lsfile   <- function(list_files){
    aggregate_ls = foreach(ix = 1:length(list_files)) %dopar% {
      attach(list_files[ix])
      tmpls <- ls(pos = 2)
      return(tmpls)
    }
  return(aggregate_ls)
}

lsfile("f1.rdat")
lsfile(dir(pattern = "*rdat"))

This is useful to me because I can now parallelize this. This is a bare-bones version, and I will modify it to give more detailed information, but so far it seems to be the only way to avoid conflicts, even without ignore.

So, question #1 can be resolved by either ignoring the warnings (as @Joshua suggested) or by using whatever magic foreach summons.

For part 2, loading an object, I think @Joshua has the right idea - "get" will do.

The foreach magic can also work, by using the .noexport option. However, this has risks: whatever isn't specifically excluded will be inherited/exported from the global environment (I could do ls(), but there's always the possibility of attached datasets). For safety, this means that get() must still be used to avoid the risk of a naming conflict. Loading into a subenvironment avoids the naming conflict, but doesn't avoid the loading of unnecessary objects.

@Joshua's answer is far simpler than my foreach detour.

Upvotes: 2

Dirk is no longer here
Dirk is no longer here

Reputation: 368639

I just use an env= argument to load():

> x <- 1; y <- 2; z <- "foo"
> save(x, y, z, file="/tmp/foo.RData")
> ne <- new.env()
> load(file="/tmp/foo.RData", env=ne)
> ls(env=ne)
[1] "x" "y" "z"
> ne$z
[1] "foo"
> 

The cost of this approach is that you do read the whole RData file---but on the other hand that seems to be unavoidable anyway as no other method seems to offer a list of the 'content' of such a file.

Upvotes: 8

Joshua Ulrich
Joshua Ulrich

Reputation: 176738

You can suppress the warning by setting warn.conflicts=FALSE on the call to attach. If an object is masked by one in the global environment, you can use get to retreive it from your attached data.

x <- 1:10
save(x, file="x.rData")
#attach("x.rData", pos=2, warn.conflicts=FALSE)
attach("x.rData", pos=2)
(x <- 1)
# [1] 1
(x <- get("x", pos=2))
# [1]  1  2  3  4  5  6  7  8  9 10

Upvotes: 4

Related Questions