Reputation: 10543
I have a data.table
of measurements, each column has a lower detection limit, (and possibly an upper detection limit)
set.seed(1)
dt <- data.table(id=1:5, A=rnorm(5), B=rnorm(5, mean=2), C=rnorm(5,mean=-1))
setkey(dt, id)
# "randomly" disperse upper an lower limits to measurement columns
dt[3,A := -5]
dt[2,B := -3]
dt[5,B := 7]
dt[1,C := -10]
dt
id A B C
1: 1 -0.6264538 1.179532 -10.0000000
2: 2 0.1836433 -3.000000 -0.6101568
3: 3 -5.0000000 2.738325 -1.6212406
4: 4 1.5952808 2.575781 -3.2146999
5: 5 0.3295078 7.000000 0.1249309
I want to filter (set to NA
) out values in each column of dt
which exactly match the lower and upper measurement limits listed in another data.table
:
limits <- data.table(measurement=LETTERS[1:3], lower=c(-5,-3,-10),
upper=c(NA, 7, NA))
setkey(limits, measurement)
limits
measurement lower upper
1: A -5 NA
2: B -3 7
3: C -10 NA
My expected output is:
dt
id A B C
1: 1 -0.6264538 1.179532 NA
2: 2 0.1836433 NA -0.6101568
3: 3 NA 2.738325 -1.6212406
4: 4 1.5952808 2.575781 -3.2146999
5: 5 0.3295078 NA 0.1249309
I wasn't able to construct a nice solution to this, so at the moment I'm using a clungy for loop to get the job done:
for (i in 1:nrow(dt)) {
for (j in 2:ncol(dt)) {
if (is.na(dt[i, j, with=F])) {
next
} else if (dt[i, j, with=F] == limits[names(dt)[j]][, lower]) {
dt[i, j := NA_real_, with=F]
} else if (is.na(limits[names(dt)[j]][, upper])) {
next
} else if (dt[i, j, with=F] == limits[names(dt)[j]][, upper]) {
dt[i, j := NA_real_, with=F]
} else {
next
}
}
}
But there has to be something nicer and faster? I had a play around with apply
ing each column of the limits
data.table
to each row of dt
, but didn't have any success.
Upvotes: 2
Views: 395
Reputation: 52677
Here is an alternative:
dt[, 2:length(dt) := lapply(
2:length(dt),
function(x) ifelse(.SD[[x]] %in% limits[x - 1, c(lower, upper)], NA, .SD[[x]])
) ]
Since your rows in limits
are in the same order as the columns in dt
, you can just cycle through the columns:
id A B C
1: 1 -0.6264538 1.179532 NA
2: 2 0.1836433 NA -0.6101568
3: 3 NA 2.738325 -1.6212406
4: 4 1.5952808 2.575781 -3.2146999
5: 5 0.3295078 NA 0.1249309
Upvotes: 2
Reputation: 118849
First, I'd transpose your limiits
data.table as follows:
require(reshape2)
require(data.table)
limits = dcast.data.table(melt(limits, id=1), variable ~ measurement)
# variable A B C
# 1: lower -5 -3 -10
# 2: upper NA 7 NA
Then you can match corresponding columns for i
and replace those matches with NA
using set
as follows:
for (i in 2:ncol(dt)) {
set(dt, i=which(dt[[i]] %in% limits[[i]]), j=i, value=NA_real_)
}
# id A B C
# 1: 1 -0.6264538 1.179532 NA
# 2: 2 0.1836433 NA -0.6101568
# 3: 3 NA 2.738325 -1.6212406
# 4: 4 1.5952808 2.575781 -3.2146999
# 5: 5 0.3295078 NA 0.1249309
Upvotes: 6