zkurtz
zkurtz

Reputation: 3288

rxDataStep transforms argument fails on user-defined functions

For example:

require(RevoScaleR)

# Create a data frame
set.seed(100)
myData = data.frame(x = 1:100, y = rep(c("a", "b", "c", "d"), 25),
                     z = rnorm(100), w = runif(100))

# Create a multi-block .xdf file from the data frame
inputFile = file.path(tempdir(), "testInput.xdf")
rxDataStep(inData = myData, outFile = inputFile, rowsPerRead = 50, 
           overwrite = TRUE)

# Square the values in the column "z"; this works fine
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transforms = list(z = z^2))

# Define a squaring function and try to use it to repeat the previous step:
myFun = function(x) x^2
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transforms = list(z = myFun(z)))

The final step crashes with the error

Error in transformation function: Error in eval(expr, envir, enclos) : could not find function "myFun"

The documentation for rxDataStep states that "As with all expressions, transforms ... can be defined outside of the function call using the expression function." But I have no idea how to implement this advice, and can't find an example. For instance, the following does not work:

myFun = expression(function(x) x^2)
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transforms = list(z = myFun(z)))

Upvotes: 1

Views: 982

Answers (2)

Derek McCrae Norton
Derek McCrae Norton

Reputation: 854

You can certainly pass an expression to transform that was created outside of the function call.

It would look something like this:

myFun <- expression(
  list(x2 = x^2,
       z2 = z^2))
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transforms = myFun)

If you want to pass a function as you have in your first example, it would look something like this:

myFun2 <- function(dataList){
  dataList$x2 <- dataList$x^2
  dataList$z2 <- dataList$z^2
  dataList
}
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transformFunc = myFun2)

Upvotes: 2

zkurtz
zkurtz

Reputation: 3288

No idea why this works!

env <- new.env()
env$myFun <- function(x) x^2
rxDataStep(inData = inputFile, outFile = inputFile, overwrite = TRUE,
           transforms = list(z = myFun(z)), transformEnvir=env) 

Upvotes: 1

Related Questions