crazybilly
crazybilly

Reputation: 3092

How to join a data frame to itself within a dplyr chain?

Occasionally, I need to join a data frame to (usually a modified) version of itself within a dplyr chain. Something like this:

df  <- data.frame(
     id = c(1,2,3)
   , status = c('foo','bar','meh')
   , spouseid = c(4,3,2)
)


df %>% 
  filter( status == 'foo' | status == 'bar') %>% 
  # join the filtered table to itself using the dot as the right-hand side
  left_join(., by = c('id' = 'spouseid'))

When I try that, I get Error in is.data.frame(y) : argument "y" is missing, with no default.

Upvotes: 7

Views: 4187

Answers (1)

crazybilly
crazybilly

Reputation: 3092

The problem is that using the dot just moves around the left hand side, so the way it's written above only passes the lhs into left_join(). To use the dot for both the left- and right-hand sides, use the dot twice:

df %>% 
  filter( status == 'foo' | status == 'bar') %>% 
  # the first dot is x argument and the second dot is the y argument
  left_join(
      x = . 
    , y = . 
    , by = c('id' = 'spouseid')
  )

This way, you're passing the lhs to both arguments of left_join() rather than relying on magrittr's implicit lhs like you normally would.

Upvotes: 7

Related Questions