Jakob
Jakob

Reputation: 1453

R: how to create and address columns of a data.table inside a function

I want to write a function within which I build several new columns. I have some problems in adressing and naming the columns of a data.table inside a function. Say I have:

library(data.table)
DT <- as.data.table(iris)

and say I am interested in creating new variables in the same way for different colums (lets say Sepal.Length, Sepal.Width and Petal.Length). I want to have a function which sums all observations by species in a new colum and then takes the ratio of each observation of Petal.Width over this sum. I have a working example if I specify the name of the columns:

thisworks <- function(a){
DT[,named_column1:=sum(eval(a),na.rm=T),by=Species]
DT[,named_column2:=named_column1/Petal.Width]
DT
}
DT <- thisworks(DT[,Sepal.Length])

However, if I want to do this for other variables (Sepal.Width and Petal.Length, resulting in a total of 6 new columns), I would like to assing a column name based on that. My non-working attempt:

thisdoesntwork <- function(b){
name1 <- paste0("total_",names(b))
#here, I don't know how to get the name of the column
DT[,assign(name1,sum(eval(column_of_interest),na.rm=T)),by=Species]
name2 <- paste0("ratio_",names(b))
DT[,assign(name2,named_column1/Petal.Width)]
DT
}

for (i in c("Sepal.Length", "Sepal.Width", "Petal.Length"){
# I know I know, loops are evil. However, data.table is fast so i dont mind
DT <- thisdoesntwork(DT[,i,with=F])
}

General hints are also very much appreciated. Would it be wiser to build two functions, for example, one for each task? Or should I write the whole function inside one big data.table filter? I think that could also work, no?

Edit: Second Case: Create two new variables using two different columns each

(this is answered in Rolands Edit)

Say I want to have ratio_Length=Sepal.Length/Petal.Length and ratio_Width=Sepal.Width/Petal.Width. Why does this not work?:

DT <- as.data.table(iris)

myfun <- function(d,variables){
d[, paste0("ratio_",substr(variables,7,99)) := 
mapply(.SD, function(x) x / mget(paste0("Petal.",substr(variables,7,99))),
.SDcols = variables)]
d[]
}
DT <- myfun(DT,c("Sepal.Length","Sepal.Width"))

Upvotes: 3

Views: 799

Answers (2)

dal233
dal233

Reputation: 80

In answer to the first part of the question I came up with this. It's closer to the author's own attempt at a solution and I personally prefer the syntax. It uses a for loop, but the author says he doesn't mind that.

library(data.table)
thisworks <- function(dat,var){
  name1 <- paste0("total.",var)
  name2 <- paste0("ratio.",var)
  DT[,c(name1) := sum(eval(dat[,var,with=F]),na.rm=T),by=Species][,c(name2) := DT[,name1,with=F]/Petal.Width]
}

DT <- as.data.table(iris)
cols <- c("Sepal.Length", "Sepal.Width", "Petal.Length")
for (i in 1:length(cols)){
thisworks(DT,cols[i])
}

Upvotes: 1

Roland
Roland

Reputation: 132696

There is no reason to assign the result of the function. := assigns by reference anyway. If you want to make a copy of the data.table, you need to make an explicit copy in the beginning of your function.

You can use .SD and .SDcols. See the data.table vignettes for details.

thisworks <- function(d, cols){
  d[, paste0(cols, 1) := lapply(.SD, sum, na.rm = TRUE), by=Species, .SDcols = cols]
  d[, paste0(cols, 2) := lapply(.SD, function(x) x / d[["Petal.Width"]]), 
     .SDcols = paste0(cols, 1)]
  d[]
}
library(data.table)
DT <- as.data.table(iris)
thisworks(DT, c("Sepal.Length", "Sepal.Width", "Petal.Length"))
print(DT)

Edit regarding your follow-up question:

myfun <- function(d,variables){
  d[, gsub(".*\\.", "Ratio.", variables) := 
      Map("/", mget(variables), mget(gsub(".*\\.", "Petal.", variables)))] 
#might be more efficient to do 
#Map(function(x, y) get(x)/get(y), variables, gsub(".*\\.", "Petal.", variables))
#do some benchmarks
  d[]
}
myfun(DT,c("Sepal.Length","Sepal.Width"))

And I can only reiterate: Do not assign when you call the function. The data.table passed to it is changed by reference.

For users unfamiliar with the first argument of gsub() above, please refer to ?regex for help.

Upvotes: 6

Related Questions