Reputation: 1484
I have a question for data.table
using .SDcols
to change.
Here is the example data:
dt
A B C D
XX XY "" ""
ZZ ZA "" ""
What I want is using .SDcols
to change ""
to NA
.
I tried this:
dt[.SD == "", lapply(.SD, is.na), .SDcols = .(A, B, C, D)]
However, I got Error.
Any help? Appreciate.
Upvotes: 1
Views: 1157
Reputation: 25225
Using the more robust method (which handle cases with no NAs) from Frank's comments, below are some timings for info.
library(data.table)
library(microbenchmark)
set.seed(6L)
N <- 1e7
numCols <- 100
pctEmpty <- 0.25
ltrs <- sample(LETTERS, N, replace=TRUE)
ltrs[sample(N, pctEmpty*N)] <- ""
dt <- as.data.table(matrix(ltrs, ncol=numCols))
str(dt)
dt1 <- copy(dt)
dt2 <- copy(dt)
microbenchmark(Replace=dt1[, (names(dt1)) := lapply(.SD, function(x) replace(x, x=="", NA_character_)), .SDcols=names(dt1)],
Assign=dt2[, (names(dt2)) := lapply(.SD, function(x) { is.na(x) <- x == ""; x }) , .SDcols=names(dt2)],
times=10L)
# Unit: milliseconds
# expr min lq mean median uq max neval
# Replace 234.0141 240.0262 311.2857 268.2718 401.9364 410.1788 10
# Assign 273.1776 276.4123 344.1861 295.1337 435.8436 449.6495 10
The difference in timings is negligible. And of course, you can play around with the parameters to find the tradeoff depending on your needs.
Upvotes: 2