Reputation: 43
I am new to R and new to stackoverflow. I am trying to figure out data.table and looked at "R data.table replacing an index of values from another data.table" and thought I understood but can't get what I want.
I have two data frames- the first is the data I'm interested in and the second is a key containing names/ids to translate the ids in the first data frame. I would like to use the "key" data.table to convert the numbers in table$id1 and table$id2 into the "Names" in the "key" data.table. Here's what I've managed so far:
table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6),
"id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
"id2" = as.character(sample(1:6, 6)))
key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"),
"id1" = as.character(1:6))
setkey(table, id1)
setkey(key, id1)
table[key, `:=`(id1 = i.Name)]
I've gotten this far (substituted values for names in table$id1) but can't figure out how to also change id2 without changing the column names, resetting the keys and re-doing the same step above for id2. In the real data set, there will be multiple Sally's, John's etc. and I want the code to "translate" both columns using the same key.
Hoping the code uses data.table (for learning purposes) but if there is another package that will do this better, that would be great too. Thanks!
Upvotes: 4
Views: 1957
Reputation: 3427
Another method of doing this:
dt <- merge(table, key, by.x = c("id1"), by.y = c("id1"), sort = FALSE)
table <- merge(dt, key, by.x = c("id2"), by.y = c("id1"), sort = FALSE)
table[, id1 := Name.x]
table[, id2 := Name.y]
table[, Name.x := NULL]
table[, Name.y := NULL]
## id2 id1 Sample Conc1 Conc2
##1: Bob Bob 41 101 200
##2: Kelsey John 46 100 203
##3: Roger Molly 43 102 206
##4: Sally Kelsey 42 105 201
##5: John Roger 44 106 202
##6: Molly Sally 45 104 204
Upvotes: 1
Reputation: 26258
In data.table
you don't need to set the keys to do a join. You can specify the join columns in the on=
argument.
And as of data.table v1.9.6
you can join on different column names using the on=
argument.
library(data.table) ## v1.9.6 +
## update id1 based on Name
table[ key, on = c("id1"), nomatch = 0, id1 := i.Name]
## here the id1 column is getting updated to i.Name
## (the 'i.' is the prefix given to columns on the 'right' side of the join).
## update id2 based on Name
table[ key, on = c(id2 = "id1"), nomatch = 0, id2 := i.Name]
table
# Sample Conc1 id1 Conc2 id2
#1: 40 100 John 201 John
#2: 43 101 Kelsey 206 Kelsey
#3: 45 103 Molly 205 Roger
#4: 42 102 Roger 204 Bob
#5: 44 104 Sally 200 Molly
#6: 41 105 Bob 202 Sally
Data
## setting seed because we are sampling
set.seed(1234)
table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6),
"id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
"id2" = as.character(sample(1:6, 6)))
key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"),
"id1" = as.character(1:6))
Upvotes: 5