Reputation: 401
I would like to apply an ifelse function looping through 2 vectors and then based on the condition update the columns of a data table. I'm looking for a solution which can work on large number of columns.
I'm demonstrating the problem through a toy dataset mtcars
.
library(data.table)
mtcars <- data.table(mtcars)
Now, I would like to limit the values for some of the columns and replace the values of the corresponding columns with the defined limits. But the below code is giving me weird results.
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- sapply(limitlist, "[[", 2)
for (i in length(limitlist)) mtcars[, c(cols) := lapply(.SD, function(x){ifelse(x[i] > lims[i], lims[i], x[i])}), .SDcols = cols]
My desired output:
range(mtcars$hp)
[1] 52 300
range(mtcars$disp)
[1] 71.1 450.0
I'm new to data.table syntax so might be a dumb error. Any help on this is highly appreciated.
Upvotes: 2
Views: 209
Reputation: 42564
That is pretty advanced stuff for a newbie to data.table
. However, here are three other variants:
set()
These approaches update only the affected elements in the respective column vectors while the other solutions posted so far (Frank's, Dan Y's) replace the whole column. There might be a performance gain if only a few elements need to be replaced.
Note that we are using limitlist
as provided by the OP.
# subsetting and updating
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
DT[get(cols[i]) > lims[i], (cols[i]) := lims[i]]
# check ressults
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
hp disp [1,] 335 472 [2,] 300 450
# update join
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
DT[.(lims[i]), on = sprintf("%s>%s", cols[i], "V1"), (cols[i]) := lims[i]]
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
hp disp [1,] 335 472 [2,] 300 450
set()
# using `set()`
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
set(DT, which(DT[[cols[i]]] > lims[i]), cols[i], lims[i])
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
hp disp [1,] 335 472 [2,] 300 450
IMHO, this variant is the most straightforward approach.
Upvotes: 2
Reputation: 66819
Since Dan's answer doesn't use data.table syntax...
library(data.table)
# input
mylist = list(hp = 300, disp = 450)
DT = data.table(mtcars)
# update
DT[, names(mylist) := Map(pmin, .SD, mylist), .SDcols=names(mylist)]
Upvotes: 2
Reputation: 6073
This should do it:
first, get your limlist to be numeric instead of character:
lims <- as.numeric(sapply(limitlist, "[[", 2))
then you can loop:
for (i in 1:length(limitlist)) {
mtcars[[cols[i]]] <- pmin(mtcars[[cols[i]]], lims[i])
}
Upvotes: 0