Marcelo Fernandes
Marcelo Fernandes

Reputation: 143

R - How to use dplyr left_join by column index?

How to use column index to dplyr::left_join (and your family)?

Example (by column names):

    library(dplyr)
    data1 <- data.frame(var1 = c("a", "b", "c"), var2 = c("d", "d", "f")) 
    data2 = data.frame(alpha = c("d", "f"), beta = c(20, 30))
    left_join(data1, data2, by = c("var2" = "alpha"))

However, replacing by = c("var2" = "alpha")) to by = c(data1[,2] = data2[,1]) results to this error:

by must be a (named) character vector, list, or NULL for natural joins (not recommended in production code), not logical.

I need to use the "column position" for loop on new functions. How can I do it?

Upvotes: 6

Views: 6993

Answers (2)

nghauran
nghauran

Reputation: 6768

Using dplyr:

# rename_at changes alpha into var2 in data2
left_join(data1, rename_at(data2, 1, ~ names(data1)[2]), by = names(data1)[2])
# output
  var1 var2 beta
1    a    d   20
2    b    d   20
3    c    f   30

Using base R:

merge(data1, data2, by.x = 2, by.y = 1, all.x = T, all.y = F)
# output
  var2 var1 beta
1    d    a   20
2    d    b   20
3    f    c   30

Upvotes: 4

EJJ
EJJ

Reputation: 1513

I don't know how you're going to use the column index but a hacky solution is the following:

#make a named vector for the by argument, see ?left_join
join_var <- names(data2)[1] #change index here based on data2
names(join_var) <- names(data1)[2] #change index here based on data1

left_join(data1, data2, by = join_var)

Depending on the final output you desire by using the column index, there is probably a more appropriate solution than this.

Upvotes: 0

Related Questions