Amy M
Amy M

Reputation: 1095

Add function output to a data.table as new columns without naming them

I have a function which takes certain columns from an existing data.table as input, performs a calculation on them and then outputs the result as five new columns.

I would like to append the five new columns to my existing data.table, but cannot find a suitable way to do this without naming the columns (which seems superfluous, since the columns are already named in the output from the function and it already outputs a data.table).

Note: my real function is not vectorised, so I have to use the 'by' argument.

In addition my real function is a wrapper for another function that produces model output, so I have converted that output to a table with as.data.table(pixiedust::dust(...)) so that I don't have to run it multiple times to get each element of the output.

Here is a toy example:

# Load data.table:
library(data.table)

# Create data.table with example data:
mydt <- data.table(region = c("a", "b", "c"), 
                   count = c(0,50,200), 
                   pop = c(1000, 10000, 20000))

# Toy function:
rate <- function(count, pop, denom){

  dt = data.table(rawrate = count/pop, 
                  rateperpop = (count/pop)*denom)
  return(dt)

}

# Apply the function to mydt:
mydt[, rate(count = count, pop = pop, denom = 100000), by = 1:nrow(mydt)]

# which gives:
   nrow rawrate rateperpop
1:    1   0.000          0
2:    2   0.005        500
3:    3   0.010       1000

In the above example, the new columns are calculated but they are not added to mydt, which remains unchanged. I've tried chaining:

mydt[][, rate(count = count, pop = pop, denom = 100000), by = 1:nrow(mydt)]

... but this doesn't add the columns either.

If I try:

mydt[, .(rate(count = count, pop = pop, denom = 100000)), by = 1:nrow(mydt)]

I get an error because of the by clause and even removing it (which I cannot do with my real function) just outputs the new variables, it does not add them to the existing data.table.

I'm sure there has to be a syntactically concise way to do this, but can't figure it out - any solutions would be much appreciated.

Upvotes: 1

Views: 657

Answers (1)

IceCreamToucan
IceCreamToucan

Reputation: 28705

One option is to create a temporary object and then use := with the output of names on the LHS

new <- mydt[, rate(count = count, pop = pop, denom = 100000)]
mydt[, names(new) := new]

Another option is to change the function so it modifies your data.table itself

rate <- function(dt, count, pop, denom){
  dt[, `:=`(rawrate = count/pop, 
            rateperpop = (count/pop)*denom)]
}

mydt
#    region count   pop
# 1:      a     0  1000
# 2:      b    50 10000
# 3:      c   200 20000

rate(mydt, count = count, pop = pop, denom = 100000)

mydt
#    region count   pop rawrate rateperpop
# 1:      a     0  1000   0.000          0
# 2:      b    50 10000   0.005        500
# 3:      c   200 20000   0.010       1000

Upvotes: 3

Related Questions