Mike Tauber
Mike Tauber

Reputation: 59

R subset df based on multiple columns from another data frame

I am trying to find a more succinct way to filter a data frame using rows from another data frame (I am currently using a loop).

For example, suppose you have the following data frame df1 consisting of quantities of apples, pears, lemons and oranges. There is also a 5th column which we will call happiness.

require(gtools)
df1 <- data.frame(permutations(n = 4, r = 4, v = 1:4)) %>% cbind(sample(1:24))
colnames(df1) <- c("Apples", "Pears", "Lemons", "Oranges", "Happiness")

However you wish to filter this dataframe to leave only certain combinations of fruit which exist in a second data frame (not with the same column order):

df2 = data.frame(Apples = c(1, 3, 2, 4), Pears = c(4, 1, 1, 3), Lemons = c(2, 2, 3, 1), Oranges = c(3, 4, 4, 2))

Currently I am using a loop to apply each row of df2 as a filter condition one-by-one and then binding the result e.g:

df.ss = list()
for (i in 1:nrow(df2)){

df.ss[[i]] = filter(df1, 
                    df1$Apples == df2$Apples & 
                    df1$Pears == df2$Pears &
                    df1$Lemons == df2$Lemons & 
                    df1$Oranges == df2$Oranges)
}

df.ss %>% bind_rows()

Is there a more elegant way of going about this ?

Upvotes: 1

Views: 3612

Answers (1)

Julien Navarre
Julien Navarre

Reputation: 7830

I think you are looking for an inner join

dplyr::inner_join(df1, df2)

Upvotes: 2

Related Questions