Reputation: 2783
I am trying to join two tables using dplyr within a function, where one of the variable names is defined by an argument to the function. In other dplyr functions, there is usually a version available for non-standard evaluation, e.g. select
& select_
, rename
and rename_
, etc, but not for the _join
family. I found this answer, but I cannot get it to work in my code below:
df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])
new_join <- function(df, sexvar){
df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])
# initial attempt using usual dplyr behaviour:
# left_join(df, df2, by = c(sexvar = 'sex'))
# attempt using NSE:
# left_join(df, df2,
# by = c(eval(substitute(var), list(var = as.name(sexvar)))) = 'sex'))
# attempt using setNames:
# left_join(df, df2, by = setNames(sexvar, 'sex'))
}
new_join(df1, 'gender')
The first and second attempt give the error
Error: 'sexvar' column not found in rhs, cannot join
while the last attempt gives the error
Error: 'gender' column not found in lhs, cannot join,
which at least shows it knows I want the column gender
, but somehow doesn't see it as a column heading.
Can anyone point out where I am going wrong?
Upvotes: 3
Views: 4080
Reputation: 3364
A oneliner in your block can look like this (which is similar to your last attempt)
left_join(df, df2, by = structure("sex", names = sexvar))
It is also possible to extend this to two varialbes
left_join(df, df2, by = structure(sexvarDF1, names = sexvarDF2))
Upvotes: 2
Reputation: 1222
Try:
df1 <- data.frame(gender = rep(c('M', 'F'), 5), var1 = letters[1:10])
new_join <- function(df, sexvar){
df2 <- data.frame(sex = rep(c('M', 'F'), 10), var2 = letters[20:1])
join_vars <- c('sex')
names(join_vars) <- sexvar
left_join(df, df2, by = join_vars)
}
new_join(df1, 'gender')
I'm sure there's a more elegant way of getting this to work using lazy evaluation, etc., but this should get you up-and-running in the meantime.
Upvotes: 4