Rasmus Larsen
Rasmus Larsen

Reputation: 6117

Best way to do nested filtering in dplyr

Nested filtering in dplyr?

The following two code chunks both work, but I am interested in whether the second code chunk (using dplyr) is could be done in one go using pipes? Or the way I have written it below is the intended/best way to do this.

In SQL i do this:

    library(sqldf)
    library(gapminder)


    sqldf( ' 
    select * from gapminder 
    where country in ( select country from gapminder where year > 2000 and lifeExp < 75) 
      and country in ( select country from gapminder where year < 2000 and lifeExp > 75 )
    ')

In dplyr I do it like this:

gapminder %>% filter(  year <  2000 & lifeExp > 75  ) %>% select(country) -> condition1
gapminder %>% filter(  year >= 2000 & lifeExp < 75  ) %>% select(country) -> condition2

gapminder %>%  filter( country %in% condition1$country, country %in% condition2$country )

Upvotes: 0

Views: 1965

Answers (1)

MrFlick
MrFlick

Reputation: 206253

An equivalent dplyr expression in this case would be

gapminder %>% 
    group_by(country) %>% 
    filter(any(year <  2000 & lifeExp > 75) & any(year >= 2000 & lifeExp < 75))

Here we search for both conditions within each grouped country

Upvotes: 6

Related Questions