rafa.pereira
rafa.pereira

Reputation: 13807

Apply function to multiple data tables

I have some data tables with the same structure and I want to make a few data transformations on them (create new variables, assign missing values etc)

This is what I've tried, without success. This codes runs ok but it does not make changes to the data tables. Any ideas?

For a reproducible example, run this snippet of code first

data("mtcars")              # load data
setDT(mtcars)               # convert to data table
mtcars[gear==5, gear :=NA]  # create NA values for the purpose of my application
mtcars2 <- mtcars           # create second DT

My code

# Create function
  computeWidth <- function(dataset){
                                    dataset$gear[is.na(dataset$gear)] <- 0 # Convert NA to 0
                                    dataset[ ,width := hp + gear]          # create new variable
                                    }

# Apply function
  lapply(list(mtcars, mtcars2), computeWidth)

As you can see, the function works fin, but it didn't modify the data tables. ny thoughts on this ?

Upvotes: 4

Views: 2020

Answers (1)

David Arenburg
David Arenburg

Reputation: 92282

Your main problem is that you are using incorrect syntax. Instead of dataset$gear[is.na(dataset$gear)] <- 0 you should be using dataset[is.na(gear), gear := 0], this way := will modify your original data set outside of the lexical scope of lapply (<- only operates within the lexical scope of a certain function). Thus modifying your function to

computeWidth <- function(dataset){
  dataset[is.na(gear), gear := 0] # Convert NA to 0
  dataset[ ,width := hp + gear]   # create new variable
}

and then running

lapply(list(mtcars, mtcars2), computeWidth) 

Will modify the original data sets.

As a side note, if you want to generalize this to many data.table objects, you could look into the tables function and try something as the following

lapply(mget(tables(silent = TRUE)$NAME), computeWidth)

Though it is always best to keep many objects in a single list in the first place instead of filling your global environment with many objects.


A very important note (suggested by @Frank), you should be aware that when using <- on unmodified data.table you are actually not creating a new object

mtcars2 <- mtcars
tracemem(mtcars)
## [1] "<00000000129264F8>"
tracemem(mtcars2)
## [1] "<00000000129264F8>"

Thus, by only modifying mtcars you will also modify mtcars2. Instead, the correct practice is to use copy as in

mtcars2 <- copy(mtcars)
tracemem(mtcars)
## [1] "<00000000129264F8>"
tracemem(mtcars2)
## [1] "<000000001315F6B8>"

See here for further details.

Upvotes: 6

Related Questions