Timothée HENRY
Timothée HENRY

Reputation: 14614

R loop over columns to calculate the number of rows that have levels in a different subset

> x <- data.table( C1=c('a','b','c','d') )
> y <- data.table( C1=c('a','b','b','a') )
> f="C1"
> x[ C1 %in% unique(y$C1),]
   C1
1:  a
2:  b

so I can see that the levels of y$C1 cover 2 rows for x$C1.

> y[ C1 %in% unique(x$C1),]
   C1
1:  a
2:  b
3:  b
4:  a

so I can see that the levels of x$C1 cover 4 rows for y$C1.

This works, but I would like to use a variable for the column name so that I can build a loop when there are many columns. The following does not work:

> y[ f %in% unique(x$C1),]
Empty data.table (0 rows) of 1 col: C1

Upvotes: 1

Views: 126

Answers (2)

akrun
akrun

Reputation: 887291

You could also use:

 f <- quote(C1)
 y[ eval(f) %in% unique(x$C1),]
 #    C1
 #1:  a
 #2:  b
 #3:  b
 #4:  a

Upvotes: 2

bjoseph
bjoseph

Reputation: 2166

This works:

 y[ get(f) %in% unique(x$C1),] 

the reason for this is that f itself refers to the string "C1"

f
 [1] "C1"

class(f)
 [1] "character"

you need to refer to the column object "C1" in the data.table itself.

below is an illustration of how get works:

a <- seq(1:10)
b <- "a"
print(b)
 [1] "a"
print(get(b))
  [1]  1  2  3  4  5  6  7  8  9 10

Upvotes: 2

Related Questions