Backlin
Backlin

Reputation: 14852

Pass an object to a function without copying it on change

My question

If an object x is passed to a function f that modifies it R will create a modified local copy of x within f's environment, rather than changing the original object (due to the copy-on-change principle). However, I have a situation where x is very big and not needed once it has been passed to f, so I want to avoid storing the original copy of x once f is called. Is there a clever way to achieve this?

f is an unknown function to be supplied by a possibly not very clever user.

My current solution

The best I have so far is to wrap x in a function forget that makes a new local reference to x called y, removes the original reference in the workspace, and then passes on the new reference. The problem is that I am not certain it accomplish what I want and it only works in globalenv(), which is a deal breaker in my current case.

forget <- function(x){
    y <- x
    # x and y now refers to the same object, which has not yet been copied
    print(tracemem(y))
    rm(list=deparse(substitute(x)), envir=globalenv())
    # The outside reference is now removed so modifying `y`
    # should no longer result in a copy (other than the
    # intermediate copy produced in the assigment)
    y
}

f <- function(x){
    print(tracemem(x))
    x[2] <- 9000.1
    x
}

Here is an example of calling the above function.

> a <- 1:3
> tracemem(a)
[1] "<0x2ac1028>"
> b <- f(forget(a))
[1] "<0x2ac1028>"
[1] "<0x2ac1028>"
tracemem[0x2ac1028 -> 0x2ac1e78]: f 
tracemem[0x2ac1e78 -> 0x308f7a0]: f 
> tracemem(b)
[1] "<0x308f7a0>"
> b
[1]    1.0 9000.1    3.0
> a
Error: object 'a' not found

Bottom line

Am I doing what I hope I am doing and is there a better way to do it?

Upvotes: 2

Views: 755

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269824

(1) Environments You can use environments for that:

e <- new.env()
e$x <- 1:3
f <- function(e) with(e, x <- x + 1)
f(e)
e$x

(2) Reference Classes or since reference classes automatically use environments use those:

E <- setRefClass("E", fields = "x",
    methods = list(
        f = function() x <<- x + 1
    )
)
e <- E$new(x = 1:3)
e$f()
e$x

(3) proto objects also use environments:

library(proto)
p <- proto(x = 1:3, f = function(.) with(., x <- x + 1))
p$f()
p$x

ADDED: proto solution

UPDATED: Changed function name to f for consistency with question.

Upvotes: 6

Dinre
Dinre

Reputation: 4216

I think the easiest approach is to only load the working copy into memory, instead of loading both the original (global namespace) and the working copy (function namespace). You can sidestep your whole issue by using the 'ff' package to define your 'x' and 'y' data sets as 'ffdf' data frames. As I understand it, 'ffdf' data frames reside on disk and load into memory only as parts of the data frame are needed and purge when those parts are no longer necessary. This would mean, theoretically, that the data would be loaded into memory to copy into the function namespace and then purged after the copy was complete.

I'll admit that I rarely have to use the 'ff' package, and when I do, I usually don't have any issues at all. I'm not checking specific memory usage, though, and my goal is usually just to perform a large calculation across the data. It works, and I don't ask questions.

Upvotes: 1

Related Questions