Reputation: 11
I have a data table "dt" like the following:
a b1 b2 b3 b4 t t1
1: 1 4 1 9 NA FALSE TRUE
2: 2 5 1 9 2 FALSE TRUE
3: 3 6 1 9 NA FALSE FALSE
library(data.table)
dt = setDT(structure(list(a = 1:3, b1 = 4:6, b2 = c(1L, 1L, 1L), b3 = c(9L,
9L, 9L), b4 = c(NA, 2L, NA), t = c(FALSE, FALSE, FALSE), t1 = c(TRUE,
TRUE, FALSE)), .Names = c("a", "b1", "b2", "b3", "b4", "t", "t1"
), row.names = c(NA, -3L), class = "data.frame"))
I want to create column t1 that is true if row-by-row the value of "a" is in any of the "b" columns such as:
dt[,t1 := a %in% c(b1,b2,b3,b4)]
However, I'm using this code in a function and the number of b columns may vary, so I'd like to refer to them dynamically.
I've tried something like this:
dt[,t := a %in% paste0("b" 1:4)]
but it looks for the values of a in the strings "b1", "b2", "b3", or "b4" and returns FALSE. Is there a way to refer to the columns of a data table dynamically in j?
Upvotes: 1
Views: 185
Reputation: 93833
Here's an attempt where you can use .SDcols
to specify b1:bn
columns, and then compare their values to that in the a
column:
dt[, Reduce(`+`, lapply(.SD, function(x) x==a & (!is.na(x)) )) > 0, .SDcols=b1:b4]
#[1] TRUE TRUE FALSE
Where dt
was:
dt <- data.table(a = 1:3, b1 = 4:6, b2 = c(1L, 1L, 1L), b3 = c(9L,
9L, 9L), b4 = c(NA, 2L, NA), t = c(FALSE, FALSE, FALSE), t1 = c(TRUE,
TRUE, FALSE))
Upvotes: 1
Reputation: 1392
This does what you want. It uses the grepl
function to grab the column numbers in which the column name has a b
in its first character. Then you use those column indexes to test whether the a
column value is equal to any of those in the b
columns.
Sum across the Boolean results of that comparison (avoiding NA
values) and test for the sum > 0 (telling us that at least one value was TRUE) and assign those values to the t1
column.
This also has the virtue (if it is one) that the b
columns do not have to be contiguous.
bcols = which(grepl('^b', names(dt)))
dt$t1 = apply((dt$a == dt[,bcols]), 1, sum, na.rm=T) > 0
Upvotes: 1