Sean G
Sean G

Reputation: 385

Filter data frame by multiple criteria from different data frame

I'd like to filter one data frame ('data') based on several values in a different data frame ('key').

My 'key' looks like this

exhibit.name  <- c("lions", "otters", "penguins")
exhibit.start <- c(as.Date("2016-04-01"), as.Date("2016-05-01"), as.Date("2016-06-01"))
exhibit.end   <- c(as.Date("2016-04-30"), as.Date("2016-05-31"), as.Date("2016-06-30"))
key           <- data_frame(exhibit.name, exhibit.start, exhibit.end)

And my 'data' looks like this

exhibit.name <- c("lions", "lions", "otters", 
                  "otters", "penguins", "penguins")
exhibit.date <- c(as.Date("2016-04-15"), as.Date("2016-12-15"), as.Date("2016-05-15"),
                  as.Date("2016-02-15"), as.Date("2016-06-15"), as.Date("2016-10-15"))
data         <- data_frame(exhibit.name, exhibit.date)

I need to filter 'data' to return rows where data$exhibit.name match key$exhibit.name AND whose data$exhibit.date fall within the related key$exhibit.start and key$exhibit.end date. The resulting data frame would look like this:

> valid.exhibits
1|lions   |2016-04-15
2|otters  |2016-05-15
3|penguins|2016-06-15

Thanks!

Upvotes: 4

Views: 129

Answers (1)

akrun
akrun

Reputation: 887118

We can do a left_join and then filter

data %>% 
   left_join(., key) %>%
   filter(exhibit.start < exhibit.date, exhibit.end  > exhibit.date)  %>% 
   select(1:2)
#     exhibit.name exhibit.date
#         <chr>       <date>
#1        lions   2016-04-15
#2       otters   2016-05-15
#3     penguins   2016-06-15

We can also use the non-equi (conditional joins from the developmental version of data.table) i.e. v1.9.7+

library(data.table)
setDT(key)
setDT(data)[key, on = .(exhibit.name, exhibit.date > exhibit.start, 
          exhibit.date < exhibit.end), new := 1]
na.omit(data)[, new := NULL][]
#   exhibit.name exhibit.date
#1:        lions   2016-04-15
#2:       otters   2016-05-15
#3:     penguins   2016-06-15

Upvotes: 4

Related Questions