Reputation: 143
How to use column index to dplyr::left_join
(and your family)?
Example (by column names):
library(dplyr)
data1 <- data.frame(var1 = c("a", "b", "c"), var2 = c("d", "d", "f"))
data2 = data.frame(alpha = c("d", "f"), beta = c(20, 30))
left_join(data1, data2, by = c("var2" = "alpha"))
However, replacing by = c("var2" = "alpha"))
to by = c(data1[,2] = data2[,1])
results to this error:
by
must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not logical.
I need to use the "column position" for loop on new functions. How can I do it?
Upvotes: 6
Views: 6993
Reputation: 6768
Using dplyr
:
# rename_at changes alpha into var2 in data2
left_join(data1, rename_at(data2, 1, ~ names(data1)[2]), by = names(data1)[2])
# output
var1 var2 beta
1 a d 20
2 b d 20
3 c f 30
Using base R
:
merge(data1, data2, by.x = 2, by.y = 1, all.x = T, all.y = F)
# output
var2 var1 beta
1 d a 20
2 d b 20
3 f c 30
Upvotes: 4
Reputation: 1513
I don't know how you're going to use the column index but a hacky solution is the following:
#make a named vector for the by argument, see ?left_join
join_var <- names(data2)[1] #change index here based on data2
names(join_var) <- names(data1)[2] #change index here based on data1
left_join(data1, data2, by = join_var)
Depending on the final output you desire by using the column index, there is probably a more appropriate solution than this.
Upvotes: 0