Debbie
Debbie

Reputation: 401

Apply if else function looping through 2 vectors to manipulate a data table

I would like to apply an ifelse function looping through 2 vectors and then based on the condition update the columns of a data table. I'm looking for a solution which can work on large number of columns.

I'm demonstrating the problem through a toy dataset mtcars.

library(data.table)
mtcars <- data.table(mtcars)

Now, I would like to limit the values for some of the columns and replace the values of the corresponding columns with the defined limits. But the below code is giving me weird results.

limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- sapply(limitlist, "[[", 2)
for (i in length(limitlist)) mtcars[, c(cols) := lapply(.SD, function(x){ifelse(x[i] > lims[i], lims[i], x[i])}), .SDcols = cols]

My desired output:

    range(mtcars$hp)
[1]  52 300


    range(mtcars$disp)
[1]  71.1 450.0

I'm new to data.table syntax so might be a dumb error. Any help on this is highly appreciated.

Upvotes: 2

Views: 209

Answers (3)

Uwe
Uwe

Reputation: 42564

That is pretty advanced stuff for a newbie to data.table. However, here are three other variants:

  1. subsetting and updating
  2. update join
  3. using set()

These approaches update only the affected elements in the respective column vectors while the other solutions posted so far (Frank's, Dan Y's) replace the whole column. There might be a performance gain if only a few elements need to be replaced.

Note that we are using limitlist as provided by the OP.

1. subsetting and updating

# subsetting and updating
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  DT[get(cols[i]) > lims[i], (cols[i]) := lims[i]]
# check ressults
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

2. update join

# update join
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  DT[.(lims[i]), on = sprintf("%s>%s", cols[i], "V1"), (cols[i]) := lims[i]]
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

3. using set()

# using `set()`
library(data.table)
DT <- data.table(mtcars)
limitlist <- list(c("hp", 300), c("disp", 450.0))
cols <- sapply(limitlist, "[[", 1)
lims <- as.numeric(sapply(limitlist, "[[", 2))
for (i in seq_along(limitlist))
  set(DT, which(DT[[cols[i]]] > lims[i]), cols[i], lims[i])
# check results
sapply(cols, function(x) {cbind(max(mtcars[, x]), max(DT[[x]]))})
      hp disp
[1,] 335  472
[2,] 300  450

IMHO, this variant is the most straightforward approach.

Upvotes: 2

Frank
Frank

Reputation: 66819

Since Dan's answer doesn't use data.table syntax...

library(data.table)

# input
mylist = list(hp = 300, disp = 450)
DT = data.table(mtcars)

# update
DT[, names(mylist) := Map(pmin, .SD, mylist), .SDcols=names(mylist)]

Upvotes: 2

DanY
DanY

Reputation: 6073

This should do it:

first, get your limlist to be numeric instead of character:

lims <- as.numeric(sapply(limitlist, "[[", 2))

then you can loop:

for (i in 1:length(limitlist)) {
    mtcars[[cols[i]]] <- pmin(mtcars[[cols[i]]], lims[i])
} 

Upvotes: 0

Related Questions