User981636
User981636

Reputation: 3629

Allocating id/factor/categories based on a group of values for each factor

I have a large data table like the following:

id var1 var2
1   1   a
2   2   d
3   6   d
4   4   b
5   6   d
6   8   a

I need to assign a category in var2 based on the values in var1. The categories do not follow any order with respect to var1 values included in each category. For instance:

lista <- c(1,5,7)
listb <- c(4,9)
listd <- c(2,6)

I have tried two approaches unsuccessfully. Using the which function:

which: DT[which(var1 %in% lista), var2 := "a"] and so on for the listb and listd.

It also didn't work the function approach (which may also be too slow for my large data table as it would have many elseif clauses). I wrote:

matchfun <- function(value){
  if (var1 %in% lista){
    value <- as.character(a)} else {
    return(value)}}

Any idea or comment on how to allocate factor/categories to group of values is very welcome.

Upvotes: 1

Views: 85

Answers (1)

Frank
Frank

Reputation: 66819

I'd suggest a merge here. Let DT be your original data table.

DT <- data.table(id=1:6,var1=c(1,2,6,4,6,8))

First, you need to store your mapping in a table:

matchDT <- rbindlist(list( 
  data.table(var1=lista,var2="a"),
  data.table(var1=listb,var2="b"),
  data.table(var1=listd,var2="d")
))

Then you can merge, optionally setting id as the key afterward to restore the original sorting.

setkey(DT,var1)
DT[matchDT,var2:=var2,nomatch=FALSE]
setkey(DT,id)

The result is

   id var1 var2
1:  1    1    a
2:  2    2    d
3:  3    6    d
4:  4    4    b
5:  5    6    d
6:  6    8   NA

The last value is NA because your lista object doesn't contain 8 (but should).

Upvotes: 3

Related Questions