Antipodean84
Antipodean84

Reputation: 43

R data.table replace values in multiple columns

I am new to R and new to stackoverflow. I am trying to figure out data.table and looked at "R data.table replacing an index of values from another data.table" and thought I understood but can't get what I want.

I have two data frames- the first is the data I'm interested in and the second is a key containing names/ids to translate the ids in the first data frame. I would like to use the "key" data.table to convert the numbers in table$id1 and table$id2 into the "Names" in the "key" data.table. Here's what I've managed so far:

table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6), 
              "id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
              "id2" = as.character(sample(1:6, 6))) 

key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"), 
            "id1" = as.character(1:6))

setkey(table, id1)
setkey(key, id1)

table[key, `:=`(id1 = i.Name)]

I've gotten this far (substituted values for names in table$id1) but can't figure out how to also change id2 without changing the column names, resetting the keys and re-doing the same step above for id2. In the real data set, there will be multiple Sally's, John's etc. and I want the code to "translate" both columns using the same key.

Hoping the code uses data.table (for learning purposes) but if there is another package that will do this better, that would be great too. Thanks!

Upvotes: 4

Views: 1957

Answers (2)

Kunal Puri
Kunal Puri

Reputation: 3427

Another method of doing this:

dt <- merge(table, key, by.x = c("id1"), by.y = c("id1"), sort = FALSE)

table <- merge(dt, key, by.x = c("id2"), by.y = c("id1"), sort = FALSE)

table[, id1 := Name.x]
table[, id2 := Name.y]
table[, Name.x := NULL]
table[, Name.y := NULL]

##      id2    id1 Sample Conc1 Conc2
##1:    Bob    Bob     41   101   200
##2: Kelsey   John     46   100   203
##3:  Roger  Molly     43   102   206
##4:  Sally Kelsey     42   105   201
##5:   John  Roger     44   106   202
##6:  Molly  Sally     45   104   204

Upvotes: 1

SymbolixAU
SymbolixAU

Reputation: 26258

In data.table you don't need to set the keys to do a join. You can specify the join columns in the on= argument.

And as of data.table v1.9.6 you can join on different column names using the on= argument.

library(data.table) ## v1.9.6 +

## update id1 based on Name
table[ key, on = c("id1"), nomatch = 0, id1 := i.Name]
## here the id1 column is getting updated to i.Name 
## (the 'i.' is the prefix given to columns on the 'right' side of the join).

## update id2 based on Name
table[ key, on = c(id2 = "id1"), nomatch = 0, id2 := i.Name]

table

#   Sample Conc1    id1 Conc2    id2
#1:     40   100   John   201   John
#2:     43   101 Kelsey   206 Kelsey
#3:     45   103  Molly   205  Roger
#4:     42   102  Roger   204    Bob
#5:     44   104  Sally   200  Molly
#6:     41   105    Bob   202  Sally

Data

## setting seed because we are sampling
set.seed(1234)
table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6), 
                  "id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
                  "id2" = as.character(sample(1:6, 6))) 

key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"), 
                "id1" = as.character(1:6))

Upvotes: 5

Related Questions