dbo
dbo

Reputation: 1234

filtering based on two conditions: values less than on matching dates?

I'm missing some basic filtering know how. What are some dplyr or other ways to return the instances when value is, say, less than 300 on the same date, for both scenario a and scenario b?

library(tidyverse)
library(lubridate)

scenario <- c("a","a","a","a","a","a","a","a","a","a","a","a","a","a",
              "b","b","b","b","b","b","b","b","b","b","b","b","b","b")

str_tstep <- c("2/29/1924", "3/31/1924",    "4/30/1924",    "5/31/1924",    "6/30/1924",    "7/31/1924",
               "8/31/1924", "9/30/1924",    "10/31/1924",   "11/30/1924",   "12/31/1924",   "1/31/1925",
               "3/31/1926", "9/30/1926",    "1/31/1922",    "1/31/1924",    "2/29/1924",    "5/31/1924",
               "10/31/1924","11/30/1924",   "12/31/1924",   "1/31/1925",    "2/28/1925",    "1/31/1926",
               "2/28/1926", "3/31/1926",    "1/31/1927",    "1/31/1928")

tstep <- mdy(str_tstep)

value <- c(260,396,348,347,368,397,418,419,190,290,504,323,800,800,355,408,250,365,222,
           299,504,323,800,397,288,800,387,415)
df <- data.frame(scenario, tstep, value)

Upvotes: 2

Views: 370

Answers (2)

akrun
akrun

Reputation: 886968

We could filter all the 'value' that are less than 300 after grouping by 'tstep'

df %>% 
 group_by(tstep) %>%
 filter(all(value < 300))
# A tibble: 7 x 3
# Groups:   tstep [4]
#  scenario tstep               value
#  <fct>    <dttm>              <dbl>
#1 a        1924-02-29 00:00:00   260
#2 a        1924-10-31 00:00:00   190
#3 a        1924-11-30 00:00:00   290
#4 b        1924-02-29 00:00:00   250
#5 b        1924-10-31 00:00:00   222
#6 b        1924-11-30 00:00:00   299
#7 b        1926-02-28 00:00:00   288

If we the number of 'scenario' are less than 2 for some 'tstep' and we want to filter them out

df %>% 
  group_by(tstep) %>% 
  filter(n_distinct(scenario)== 2 , all(value < 300))
# A tibble: 6 x 3
# Groups:   tstep [3]
#  scenario tstep               value
#  <fct>    <dttm>              <dbl>
#1 a        1924-02-29 00:00:00   260
#2 a        1924-10-31 00:00:00   190
#3 a        1924-11-30 00:00:00   290
#4 b        1924-02-29 00:00:00   250
#5 b        1924-10-31 00:00:00   222
#6 b        1924-11-30 00:00:00   299

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18425

Something like this would do it (assuming I have interpreted the question correctly)...

df %>% filter(value<300) %>% #remove values 300+
       group_by(tstep) %>% 
       filter(all(c("a","b") %in% scenario)) #check both scenarios exist for each tstep

  scenario tstep      value
1 a        1924-02-29  260.
2 a        1924-10-31  190.
3 a        1924-11-30  290.
4 b        1924-02-29  250.
5 b        1924-10-31  222.
6 b        1924-11-30  299.

This will give you the dates that appear in BOTH a and b with values below 300 (unlike akrun's solution, which also includes those that only appear in just one of a or b).

Upvotes: 2

Related Questions