Reputation: 489
I have this data.table.
library(data.table)
class<- c("a","c","v","f","r","b","t","o");
value<-c(0.76, 0.91, 1.94, 0.37, 1.35, 0.75, 1.95, 1.69);
vehicle<-c("we", "df", "rt", "yh", "uj", "er", "ed","we")
carbon<-c(0.984, 0.27, 0.419, 0.469, 0.132, 0.865, 0.562, 0.133)
cap<-c(3, 2, 1, 6, "y", "t", 4, 6)
up<-c(4, 2, 3, "d", "t", "y", "u", "i")
down<-c("t", "e", "r", 3, 4, 5, 2, 1)
amt<-c( 34, 23, 12, 67, 87, 43, 23, 12)
df<-data.table(class,value,vehicle,carbon,cap,up,down,amt)
and this another mapping table
up<-c("d","i",4)
vehicle<-c("yh", "we", "we")
exercise<-c("ty",45, "k")
map<-data.table(cbind(vehicle,up,exercise))
i need the column exercise
in the table df
I am currently using this code, which produces the desired results. and I am happy with it.
df[,names(map)[length(names(map))]:=
map$exercise[match(do.call(paste0,df[, which(names(df) %in% names(map)[1:(ncol(map)-1)]),with = FALSE]),
do.call(paste0,map[,1:(ncol(map)-1)]))] ]
So basically what this code does is.
So the desired result is
> df$exercise
[1] "k" NA NA "ty" NA NA NA "45"
But sometimes columns of the mapping table order is changed.
for e.g. changed mapping table is Notice that now the order is up and then vehicle. and in this case the above code will not produce the desired result, infact it would be all NA.
up<-c("d","i",4)
vehicle<-c("yh", "we", "we")
exercise<-c("ty",45, "k")
map<-as.data.frame(cbind(up,vehicle,exercise))
setDT(map)
So my code only works if the order of the columns in mapping table is same as in the main table. If my code can be changed to perform the same results but considering the order of the columns. ideally would want this as dynamic as possible.
mapping table can have as many columns as in the main table and an additional column which needs to be inserted in the main table.
Please comment if you need any further clarification. I would appreciate if my given code can be edited and provided. any other code is also welcome. I prefer data.table package use.
Upvotes: 2
Views: 230
Reputation: 489
The code below works perfectly and should always be preferred.
setcolorder(map[df, on = .NATURAL], union(names(df), names(map)))[]
The other code in the answers to this question doesn't considers any irregularities in the mapping.
Thank you Merijn-van-tilborg for your valuable contribution.
Upvotes: 0
Reputation: 160792
I think the biggest issue here is not the basic mechanism of a merge/join, it's doing so when the columns to join on are not known, and the columns to join into df
are also not known.
I suggest this is a reasonable approach:
df <- data.table::as.data.table(structure(list(class = c("a", "c", "v", "f", "r", "b", "t", "o"), value = c(0.76, 0.91, 1.94, 0.37, 1.35, 0.75, 1.95, 1.69), vehicle = c("we", "df", "rt", "yh", "uj", "er", "ed", "we"), carbon = c(0.984, 0.27, 0.419, 0.469, 0.132, 0.865, 0.562, 0.133), cap = c("3", "2", "1", "6", "y", "t", "4", "6"), up = c("4", "2", "3", "d", "t", "y", "u", "i"), down = c("t", "e", "r", "3", "4", "5", "2", "1"), amt = c(34, 23, 12, 67, 87, 43, 23, 12)), row.names = c(NA, -8L), class = c("data.table", "data.frame" )))
map <- data.table::as.data.table(structure(list(up = c("d", "i", "4"), vehicle = c("yh", "we", "we"), exercise = c("ty", "45", "k")), class = c("data.table", "data.frame"), row.names = c(NA, -3L)))
(same <- intersect(names(map), names(df)))
# [1] "up" "vehicle"
(diff <- setdiff(names(map), names(df)))
# [1] "exercise"
if (length(same) && length(diff))
df[map, c(diff) := mget(diff), on = same]
# class value vehicle carbon cap up down amt exercise
# <char> <num> <char> <num> <char> <char> <char> <num> <char>
# 1: a 0.76 we 0.984 3 4 t 34 k
# 2: c 0.91 df 0.270 2 2 e 23 <NA>
# 3: v 1.94 rt 0.419 1 3 r 12 <NA>
# 4: f 0.37 yh 0.469 6 d 3 67 ty
# 5: r 1.35 uj 0.132 y t 4 87 <NA>
# 6: b 0.75 er 0.865 t y 5 43 <NA>
# 7: t 1.95 ed 0.562 4 u 2 23 <NA>
# 8: o 1.69 we 0.133 6 i 1 12 45
Granted, there are likely several things that can go wrong when there are unexpected columns in map
or df
.
I added the if (length(same) && length(diff))
portion so that we don't inadvertently try to join on zero columns or merge in no columns.
Edit: @MerijnvanTilborg says that df[map, on=.NATURAL]
will work too ... I haven't tested all corners of it, but it appears to work as desired, recognizing that we need to capture its return value:
map[df, on = .NATURAL] # right output, but ...
df # ... but df is unchanged, so ...
df <- map[df, on = .NATURAL]
and repeated calls to this does the right thing (i.e., nothing more).
However, this will produce different results if there are any 1-to-many or many-to-1 relationships in the join. If that's happening, then either:
allow.cartesian=TRUE
inside the brackets; this will never break things if it's not needed, but make sure you know what it means before using it carte-blanche;Upvotes: 2