Reputation: 1234
I'm missing some basic filtering know how. What are some dplyr
or other ways to return the instances when value
is, say, less than 300
on the same date, for both scenario a
and scenario b
?
library(tidyverse)
library(lubridate)
scenario <- c("a","a","a","a","a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b","b","b","b","b")
str_tstep <- c("2/29/1924", "3/31/1924", "4/30/1924", "5/31/1924", "6/30/1924", "7/31/1924",
"8/31/1924", "9/30/1924", "10/31/1924", "11/30/1924", "12/31/1924", "1/31/1925",
"3/31/1926", "9/30/1926", "1/31/1922", "1/31/1924", "2/29/1924", "5/31/1924",
"10/31/1924","11/30/1924", "12/31/1924", "1/31/1925", "2/28/1925", "1/31/1926",
"2/28/1926", "3/31/1926", "1/31/1927", "1/31/1928")
tstep <- mdy(str_tstep)
value <- c(260,396,348,347,368,397,418,419,190,290,504,323,800,800,355,408,250,365,222,
299,504,323,800,397,288,800,387,415)
df <- data.frame(scenario, tstep, value)
Upvotes: 2
Views: 370
Reputation: 886968
We could filter
all
the 'value' that are less than 300 after grouping by 'tstep'
df %>%
group_by(tstep) %>%
filter(all(value < 300))
# A tibble: 7 x 3
# Groups: tstep [4]
# scenario tstep value
# <fct> <dttm> <dbl>
#1 a 1924-02-29 00:00:00 260
#2 a 1924-10-31 00:00:00 190
#3 a 1924-11-30 00:00:00 290
#4 b 1924-02-29 00:00:00 250
#5 b 1924-10-31 00:00:00 222
#6 b 1924-11-30 00:00:00 299
#7 b 1926-02-28 00:00:00 288
If we the number of 'scenario' are less than 2 for some 'tstep' and we want to filter them out
df %>%
group_by(tstep) %>%
filter(n_distinct(scenario)== 2 , all(value < 300))
# A tibble: 6 x 3
# Groups: tstep [3]
# scenario tstep value
# <fct> <dttm> <dbl>
#1 a 1924-02-29 00:00:00 260
#2 a 1924-10-31 00:00:00 190
#3 a 1924-11-30 00:00:00 290
#4 b 1924-02-29 00:00:00 250
#5 b 1924-10-31 00:00:00 222
#6 b 1924-11-30 00:00:00 299
Upvotes: 2
Reputation: 18425
Something like this would do it (assuming I have interpreted the question correctly)...
df %>% filter(value<300) %>% #remove values 300+
group_by(tstep) %>%
filter(all(c("a","b") %in% scenario)) #check both scenarios exist for each tstep
scenario tstep value
1 a 1924-02-29 260.
2 a 1924-10-31 190.
3 a 1924-11-30 290.
4 b 1924-02-29 250.
5 b 1924-10-31 222.
6 b 1924-11-30 299.
This will give you the dates that appear in BOTH a
and b
with values below 300 (unlike akrun's solution, which also includes those that only appear in just one of a
or b
).
Upvotes: 2