SmurfAcco
SmurfAcco

Reputation: 35

Selecting single elements in data.table based on logical condition

I am looking at changing some single elements of a data.table based on a certain condition. Basically, if a element of the DT matches one element of the vector invalid_values I want to set it to -99.

I have given a simple reproducible example below. Checking all values of the DT to match one value works fine, but when i switch to %in% it doesn't work as intended. I would like to avoid a for loop here.

DT = data.table(
a = rep(c(2,9), 5),
b = rep(c(7,5), 5),
c = rep(c(1,2), 5),
d = rep(c(4,8), 5)
)

invalid_values <- c(2, 7)

# wiith loop
for (i in seq(invalid_values)) {
  DT[DT == invalid_values[i] <- -99]
}

# idea
DT[DT %in% invalid_values] <- -99

Any help appreciated.

Upvotes: 0

Views: 157

Answers (3)

akrun
akrun

Reputation: 887118

We can use fcase

library(data.table)
DT[, lapply(.SD, function(x) fcase(x %in% invalid_values, -99, !x %in% invalid_values, x))]

Upvotes: 0

s_baldur
s_baldur

Reputation: 33488

If you only have one or two invalid values, you might want to use matrix subsetting available in base R with ==:

setDF(DT)
DT[DT == invalid_values[1] | DT == invalid_values[2]] <- -99
setDT(DT)

The set* functions allow you to switch between data.table and data.frame without much cost.

Using %in%:

setDF(DT)
DT[matrix(as.matrix(DT) %in% invalid_values, nrow = nrow(DT))] <- -99
setDT(DT)

EDIT

As David Arenburg points out: if dealing with a big dataset a more step-by-step (and memory-friendly) solution might be more adequate than evaluating the whole matrix at once or lapply() over everything.

for (j in names(DT)) set(DT, i = which(DT[[j]] %in% invalid_values), j = j, value = -99)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

%in% would not directly works on dataframe/data.table. Use lapply to iterate over the columns and replace the values which are invalid_values to -99.

library(data.table)
DT[, lapply(.SD, function(x) replace(x, x %in% invalid_values, -99))]

#      a   b   c d
# 1: -99 -99   1 4
# 2:   9   5 -99 8
# 3: -99 -99   1 4
# 4:   9   5 -99 8
# 5: -99 -99   1 4
# 6:   9   5 -99 8
# 7: -99 -99   1 4
# 8:   9   5 -99 8
# 9: -99 -99   1 4
#10:   9   5 -99 8

Upvotes: 1

Related Questions