wellshux
wellshux

Reputation: 11

Compare values of one column of data table to dynamic number of columns in same table

I have a data table "dt" like the following:

    a b1 b2 b3 b4     t    t1
 1: 1  4  1  9 NA FALSE  TRUE
 2: 2  5  1  9  2 FALSE  TRUE
 3: 3  6  1  9 NA FALSE FALSE

library(data.table)
dt = setDT(structure(list(a = 1:3, b1 = 4:6, b2 = c(1L, 1L, 1L), b3 = c(9L, 
9L, 9L), b4 = c(NA, 2L, NA), t = c(FALSE, FALSE, FALSE), t1 = c(TRUE, 
TRUE, FALSE)), .Names = c("a", "b1", "b2", "b3", "b4", "t", "t1"
), row.names = c(NA, -3L), class = "data.frame"))

I want to create column t1 that is true if row-by-row the value of "a" is in any of the "b" columns such as:

dt[,t1 := a %in% c(b1,b2,b3,b4)]

However, I'm using this code in a function and the number of b columns may vary, so I'd like to refer to them dynamically.

I've tried something like this:

dt[,t := a %in% paste0("b" 1:4)]

but it looks for the values of a in the strings "b1", "b2", "b3", or "b4" and returns FALSE. Is there a way to refer to the columns of a data table dynamically in j?

Upvotes: 1

Views: 185

Answers (2)

thelatemail
thelatemail

Reputation: 93833

Here's an attempt where you can use .SDcols to specify b1:bn columns, and then compare their values to that in the a column:

dt[, Reduce(`+`, lapply(.SD, function(x) x==a & (!is.na(x)) )) > 0, .SDcols=b1:b4]
#[1]  TRUE  TRUE FALSE

Where dt was:

dt <- data.table(a = 1:3, b1 = 4:6, b2 = c(1L, 1L, 1L), b3 = c(9L, 
9L, 9L), b4 = c(NA, 2L, NA), t = c(FALSE, FALSE, FALSE), t1 = c(TRUE, 
TRUE, FALSE))

Upvotes: 1

Edward Carney
Edward Carney

Reputation: 1392

This does what you want. It uses the grepl function to grab the column numbers in which the column name has a b in its first character. Then you use those column indexes to test whether the a column value is equal to any of those in the b columns.

Sum across the Boolean results of that comparison (avoiding NA values) and test for the sum > 0 (telling us that at least one value was TRUE) and assign those values to the t1 column.

This also has the virtue (if it is one) that the b columns do not have to be contiguous.

bcols = which(grepl('^b', names(dt)))
dt$t1 = apply((dt$a == dt[,bcols]), 1, sum, na.rm=T) > 0

Upvotes: 1

Related Questions