Reputation: 1095
I have a function which takes certain columns from an existing data.table as input, performs a calculation on them and then outputs the result as five new columns.
I would like to append the five new columns to my existing data.table, but cannot find a suitable way to do this without naming the columns (which seems superfluous, since the columns are already named in the output from the function and it already outputs a data.table).
Note: my real function is not vectorised, so I have to use the 'by' argument.
In addition my real function is a wrapper for another function that produces model output, so I have converted that output to a table with as.data.table(pixiedust::dust(...))
so that I don't have to run it multiple times to get each element of the output.
Here is a toy example:
# Load data.table:
library(data.table)
# Create data.table with example data:
mydt <- data.table(region = c("a", "b", "c"),
count = c(0,50,200),
pop = c(1000, 10000, 20000))
# Toy function:
rate <- function(count, pop, denom){
dt = data.table(rawrate = count/pop,
rateperpop = (count/pop)*denom)
return(dt)
}
# Apply the function to mydt:
mydt[, rate(count = count, pop = pop, denom = 100000), by = 1:nrow(mydt)]
# which gives:
nrow rawrate rateperpop
1: 1 0.000 0
2: 2 0.005 500
3: 3 0.010 1000
In the above example, the new columns are calculated but they are not added to mydt
, which remains unchanged. I've tried chaining:
mydt[][, rate(count = count, pop = pop, denom = 100000), by = 1:nrow(mydt)]
... but this doesn't add the columns either.
If I try:
mydt[, .(rate(count = count, pop = pop, denom = 100000)), by = 1:nrow(mydt)]
I get an error because of the by
clause and even removing it (which I cannot do with my real function) just outputs the new variables, it does not add them to the existing data.table.
I'm sure there has to be a syntactically concise way to do this, but can't figure it out - any solutions would be much appreciated.
Upvotes: 1
Views: 657
Reputation: 28705
One option is to create a temporary object and then use :=
with the output of names
on the LHS
new <- mydt[, rate(count = count, pop = pop, denom = 100000)]
mydt[, names(new) := new]
Another option is to change the function so it modifies your data.table itself
rate <- function(dt, count, pop, denom){
dt[, `:=`(rawrate = count/pop,
rateperpop = (count/pop)*denom)]
}
mydt
# region count pop
# 1: a 0 1000
# 2: b 50 10000
# 3: c 200 20000
rate(mydt, count = count, pop = pop, denom = 100000)
mydt
# region count pop rawrate rateperpop
# 1: a 0 1000 0.000 0
# 2: b 50 10000 0.005 500
# 3: c 200 20000 0.010 1000
Upvotes: 3