Reputation: 13
I would like to filter data by date range by group ID. I have a dataframe that consists of an ID column (serial number), a datetime column, and a few columns of data. Each serial number has a unique date range I would like to pull out of the total dataset, hopefully returning two dataframes consisting of "pre-" and "post-" period data, respectively.
Specifically, I would like my "pre-" data frame to consist of the two days of data available for each serial number, and my "post-" data frame to consist of two days of data following a date seen in a separate "work_stop" dataframe.
To provide a reproducible example, we can use the following as input data:
sn <- c("XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF", "DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG")
datetime <- c("12/1/2021", "12/2/2021", "12/4/2021", "12/5/2021", "12/6/2021", "12/7/2021", "1/7/2021", "1/8/2021", "1/9/2021", "1/10/2021", "1/11/2021", "1/12/2021", "6/13/2021", "6/14/2021", "6/15/2021", "6/16/2021", "6/17/2021", "6/18/2021")
x <- c("3", "8", "4", "6", "1", "1", "3", "5", "8", "9", "9", "7", "12", "4", "2", "2", "4", "5")
df <- data.frame(sn, datetime, x)
sn <- c("XZY12CDF","DFFSJ487F","JSDFRP49TG")
stop_date <- c("12/5/2021", "1/10/2021", "6/16/2021")
work_stop <- data.frame(sn, stop_date)
And as an output I would like two dataframes that look like the following:
sn_pre <- c("XZY12CDF", "XZY12CDF", "DFFSJ487F", "DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG")
datetime_pre <- c("12/1/2021", "12/2/2021", "1/7/2021", "1/8/2021", "6/13/2021", "6/14/2021")
x_pre <- c("3", "8", "3", "5", "12", "4")
pre <- data.frame(sn_pre, datetime_pre, x_pre)
sn_post <- c("XZY12CDF", "XZY12CDF", "DFFSJ487F", "DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG")
datetime_post <- c("12/6/2021", "12/7/2021","1/11/2021", "1/12/2021","6/17/2021", "6/18/2021")
x_post <- c("1", "1","9", "7","4", "5")
post <- data.frame(sn_post, datetime_post, x_post)
I am currently using group_by(sn)
and group_split()
(or just split(df, sn)
)to split the full dataframe into tibbles containing the data for each serial number, but am having trouble figuring out how to do my time splits to each tibble, or to the dataframe as a whole. Thanks!
Upvotes: 0
Views: 298
Reputation: 66900
We can join them by "sn" and then use that date (once it's converted to a date type of data) to assign the group:
library(dplyr)
left_join(
df %>% mutate(datetime = lubridate::mdy(datetime)),
work_stop %>% mutate(stop_date = lubridate::mdy(stop_date))
) %>%
mutate(group = if_else(datetime >= stop_date, "post", "pre")) %>%
group_split(group)
Result
[[1]]
# A tibble: 9 × 5
sn datetime x stop_date group
<chr> <date> <chr> <date> <chr>
1 XZY12CDF 2021-12-05 6 2021-12-05 post
2 XZY12CDF 2021-12-06 1 2021-12-05 post
3 XZY12CDF 2021-12-07 1 2021-12-05 post
4 DFFSJ487F 2021-01-10 9 2021-01-10 post
5 DFFSJ487F 2021-01-11 9 2021-01-10 post
6 DFFSJ487F 2021-01-12 7 2021-01-10 post
7 JSDFRP49TG 2021-06-16 2 2021-06-16 post
8 JSDFRP49TG 2021-06-17 4 2021-06-16 post
9 JSDFRP49TG 2021-06-18 5 2021-06-16 post
[[2]]
# A tibble: 9 × 5
sn datetime x stop_date group
<chr> <date> <chr> <date> <chr>
1 XZY12CDF 2021-12-01 3 2021-12-05 pre
2 XZY12CDF 2021-12-02 8 2021-12-05 pre
3 XZY12CDF 2021-12-04 4 2021-12-05 pre
4 DFFSJ487F 2021-01-07 3 2021-01-10 pre
5 DFFSJ487F 2021-01-08 5 2021-01-10 pre
6 DFFSJ487F 2021-01-09 8 2021-01-10 pre
7 JSDFRP49TG 2021-06-13 12 2021-06-16 pre
8 JSDFRP49TG 2021-06-14 4 2021-06-16 pre
9 JSDFRP49TG 2021-06-15 2 2021-06-16 pre
Upvotes: 3