Reputation: 35
I am looking at changing some single elements of a data.table based on a certain condition. Basically, if a element of the DT matches one element of the vector invalid_values
I want to set it to -99.
I have given a simple reproducible example below. Checking all values of the DT to match one value works fine, but when i switch to %in% it doesn't work as intended. I would like to avoid a for loop here.
DT = data.table(
a = rep(c(2,9), 5),
b = rep(c(7,5), 5),
c = rep(c(1,2), 5),
d = rep(c(4,8), 5)
)
invalid_values <- c(2, 7)
# wiith loop
for (i in seq(invalid_values)) {
DT[DT == invalid_values[i] <- -99]
}
# idea
DT[DT %in% invalid_values] <- -99
Any help appreciated.
Upvotes: 0
Views: 157
Reputation: 887118
We can use fcase
library(data.table)
DT[, lapply(.SD, function(x) fcase(x %in% invalid_values, -99, !x %in% invalid_values, x))]
Upvotes: 0
Reputation: 33488
If you only have one or two invalid values, you might want to use matrix subsetting available in base R with ==
:
setDF(DT)
DT[DT == invalid_values[1] | DT == invalid_values[2]] <- -99
setDT(DT)
The set*
functions allow you to switch between data.table and data.frame without much cost.
Using %in%
:
setDF(DT)
DT[matrix(as.matrix(DT) %in% invalid_values, nrow = nrow(DT))] <- -99
setDT(DT)
EDIT
As David Arenburg points out: if dealing with a big dataset a more step-by-step (and memory-friendly) solution might be more adequate than evaluating the whole matrix at once or lapply()
over everything.
for (j in names(DT)) set(DT, i = which(DT[[j]] %in% invalid_values), j = j, value = -99)
Upvotes: 1
Reputation: 388982
%in%
would not directly works on dataframe/data.table. Use lapply
to iterate over the columns and replace
the values which are invalid_values
to -99
.
library(data.table)
DT[, lapply(.SD, function(x) replace(x, x %in% invalid_values, -99))]
# a b c d
# 1: -99 -99 1 4
# 2: 9 5 -99 8
# 3: -99 -99 1 4
# 4: 9 5 -99 8
# 5: -99 -99 1 4
# 6: 9 5 -99 8
# 7: -99 -99 1 4
# 8: 9 5 -99 8
# 9: -99 -99 1 4
#10: 9 5 -99 8
Upvotes: 1