Mohit
Mohit

Reputation: 489

Match the columns of a table from another table to get the desired column in the main table in R

I have this data.table.

library(data.table)
class<- c("a","c","v","f","r","b","t","o");
value<-c(0.76,  0.91,   1.94,   0.37,   1.35,   0.75,   1.95,   1.69);
vehicle<-c("we",    "df",   "rt",   "yh",   "uj",   "er",   "ed","we")
carbon<-c(0.984,    0.27,   0.419,  0.469,  0.132,  0.865,  0.562,  0.133)
cap<-c(3,   2,  1,  6,  "y",    "t",    4,  6)
up<-c(4,    2,  3,  "d",    "t",    "y",    "u",    "i")
down<-c("t",    "e",    "r",    3,  4,  5,  2,  1)
amt<-c( 34, 23, 12, 67, 87, 43, 23, 12)
df<-data.table(class,value,vehicle,carbon,cap,up,down,amt)

and this another mapping table

up<-c("d","i",4)
vehicle<-c("yh",    "we",   "we")
exercise<-c("ty",45,    "k")
map<-data.table(cbind(vehicle,up,exercise))

i need the column exercise in the table df

I am currently using this code, which produces the desired results. and I am happy with it.

df[,names(map)[length(names(map))]:= 
                map$exercise[match(do.call(paste0,df[, which(names(df) %in% names(map)[1:(ncol(map)-1)]),with = FALSE]),
                                                   do.call(paste0,map[,1:(ncol(map)-1)]))] ]

So basically what this code does is.

  1. identify the columns from mapping table in the main table.
  2. concatenate those columns.
  3. do a match of these concatenated columns with the concatenated columns of mapping table.
  4. index the desired column from mapping table and fix it to the main table.

So the desired result is

> df$exercise
[1] "k"  NA   NA   "ty" NA   NA   NA   "45"

But sometimes columns of the mapping table order is changed.

for e.g. changed mapping table is Notice that now the order is up and then vehicle. and in this case the above code will not produce the desired result, infact it would be all NA.

up<-c("d","i",4)
vehicle<-c("yh",    "we",   "we")
exercise<-c("ty",45,    "k")
map<-as.data.frame(cbind(up,vehicle,exercise))
setDT(map)

So my code only works if the order of the columns in mapping table is same as in the main table. If my code can be changed to perform the same results but considering the order of the columns. ideally would want this as dynamic as possible.

mapping table can have as many columns as in the main table and an additional column which needs to be inserted in the main table.

Please comment if you need any further clarification. I would appreciate if my given code can be edited and provided. any other code is also welcome. I prefer data.table package use.

Upvotes: 2

Views: 230

Answers (2)

Mohit
Mohit

Reputation: 489

The code below works perfectly and should always be preferred.

setcolorder(map[df, on = .NATURAL], union(names(df), names(map)))[]

The other code in the answers to this question doesn't considers any irregularities in the mapping.

Thank you Merijn-van-tilborg for your valuable contribution.

Upvotes: 0

r2evans
r2evans

Reputation: 160792

I think the biggest issue here is not the basic mechanism of a merge/join, it's doing so when the columns to join on are not known, and the columns to join into df are also not known.

I suggest this is a reasonable approach:

df <- data.table::as.data.table(structure(list(class = c("a", "c", "v", "f", "r", "b", "t", "o"), value = c(0.76, 0.91, 1.94, 0.37, 1.35, 0.75, 1.95, 1.69), vehicle = c("we", "df", "rt", "yh", "uj", "er", "ed", "we"), carbon = c(0.984, 0.27, 0.419, 0.469, 0.132, 0.865, 0.562, 0.133), cap = c("3", "2", "1", "6", "y", "t", "4", "6"), up = c("4", "2", "3", "d", "t", "y", "u", "i"), down = c("t", "e", "r", "3", "4", "5", "2", "1"), amt = c(34, 23, 12, 67, 87, 43, 23, 12)), row.names = c(NA, -8L), class = c("data.table", "data.frame" )))
map <- data.table::as.data.table(structure(list(up = c("d", "i", "4"), vehicle = c("yh", "we", "we"), exercise = c("ty", "45", "k")), class = c("data.table", "data.frame"), row.names = c(NA, -3L)))

(same <- intersect(names(map), names(df)))
# [1] "up"      "vehicle"
(diff <- setdiff(names(map), names(df)))
# [1] "exercise"
if (length(same) && length(diff)) 
  df[map, c(diff) := mget(diff), on = same]
#     class value vehicle carbon    cap     up   down   amt exercise
#    <char> <num>  <char>  <num> <char> <char> <char> <num>   <char>
# 1:      a  0.76      we  0.984      3      4      t    34        k
# 2:      c  0.91      df  0.270      2      2      e    23     <NA>
# 3:      v  1.94      rt  0.419      1      3      r    12     <NA>
# 4:      f  0.37      yh  0.469      6      d      3    67       ty
# 5:      r  1.35      uj  0.132      y      t      4    87     <NA>
# 6:      b  0.75      er  0.865      t      y      5    43     <NA>
# 7:      t  1.95      ed  0.562      4      u      2    23     <NA>
# 8:      o  1.69      we  0.133      6      i      1    12       45

Granted, there are likely several things that can go wrong when there are unexpected columns in map or df.

I added the if (length(same) && length(diff)) portion so that we don't inadvertently try to join on zero columns or merge in no columns.


Edit: @MerijnvanTilborg says that df[map, on=.NATURAL] will work too ... I haven't tested all corners of it, but it appears to work as desired, recognizing that we need to capture its return value:

map[df, on = .NATURAL] # right output, but ...
df                     # ... but df is unchanged, so ...
df <- map[df, on = .NATURAL]

and repeated calls to this does the right thing (i.e., nothing more).

However, this will produce different results if there are any 1-to-many or many-to-1 relationships in the join. If that's happening, then either:

  • if this is a known 1-to-many relationship, then you may need to add allow.cartesian=TRUE inside the brackets; this will never break things if it's not needed, but make sure you know what it means before using it carte-blanche;
  • if this is unexpected (i.e., you expect 1-to-1 for all), then check your data for duplicates, bad data, and/or make sure you have all needed join-keys

Upvotes: 2

Related Questions