Sam
Sam

Reputation: 13

Filter date range by group - R

I would like to filter data by date range by group ID. I have a dataframe that consists of an ID column (serial number), a datetime column, and a few columns of data. Each serial number has a unique date range I would like to pull out of the total dataset, hopefully returning two dataframes consisting of "pre-" and "post-" period data, respectively.

Specifically, I would like my "pre-" data frame to consist of the two days of data available for each serial number, and my "post-" data frame to consist of two days of data following a date seen in a separate "work_stop" dataframe.

To provide a reproducible example, we can use the following as input data:

sn <- c("XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF","XZY12CDF", "DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F","DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG", "JSDFRP49TG")
datetime <- c("12/1/2021", "12/2/2021", "12/4/2021", "12/5/2021", "12/6/2021", "12/7/2021", "1/7/2021", "1/8/2021", "1/9/2021", "1/10/2021", "1/11/2021", "1/12/2021", "6/13/2021", "6/14/2021", "6/15/2021", "6/16/2021", "6/17/2021", "6/18/2021")
x <- c("3", "8", "4", "6", "1", "1", "3", "5", "8", "9", "9", "7", "12", "4", "2", "2", "4", "5") 
df <- data.frame(sn, datetime, x)

sn <- c("XZY12CDF","DFFSJ487F","JSDFRP49TG")
stop_date <- c("12/5/2021", "1/10/2021", "6/16/2021")
work_stop <- data.frame(sn, stop_date)

And as an output I would like two dataframes that look like the following:

sn_pre <- c("XZY12CDF", "XZY12CDF", "DFFSJ487F", "DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG")
datetime_pre <- c("12/1/2021", "12/2/2021", "1/7/2021", "1/8/2021", "6/13/2021", "6/14/2021")
x_pre <- c("3", "8", "3", "5", "12", "4")
pre <- data.frame(sn_pre, datetime_pre, x_pre)

sn_post <- c("XZY12CDF", "XZY12CDF", "DFFSJ487F", "DFFSJ487F", "JSDFRP49TG", "JSDFRP49TG")
datetime_post <- c("12/6/2021", "12/7/2021","1/11/2021", "1/12/2021","6/17/2021", "6/18/2021")
x_post <- c("1", "1","9", "7","4", "5")
post <- data.frame(sn_post, datetime_post, x_post)

I am currently using group_by(sn) and group_split() (or just split(df, sn))to split the full dataframe into tibbles containing the data for each serial number, but am having trouble figuring out how to do my time splits to each tibble, or to the dataframe as a whole. Thanks!

Upvotes: 0

Views: 298

Answers (1)

Jon Spring
Jon Spring

Reputation: 66900

We can join them by "sn" and then use that date (once it's converted to a date type of data) to assign the group:

library(dplyr)
left_join(
  df %>% mutate(datetime = lubridate::mdy(datetime)),
  work_stop %>% mutate(stop_date = lubridate::mdy(stop_date))
  ) %>%
  mutate(group = if_else(datetime >= stop_date, "post", "pre")) %>%
  group_split(group)

Result

[[1]]
# A tibble: 9 × 5
  sn         datetime   x     stop_date  group
  <chr>      <date>     <chr> <date>     <chr>
1 XZY12CDF   2021-12-05 6     2021-12-05 post 
2 XZY12CDF   2021-12-06 1     2021-12-05 post 
3 XZY12CDF   2021-12-07 1     2021-12-05 post 
4 DFFSJ487F  2021-01-10 9     2021-01-10 post 
5 DFFSJ487F  2021-01-11 9     2021-01-10 post 
6 DFFSJ487F  2021-01-12 7     2021-01-10 post 
7 JSDFRP49TG 2021-06-16 2     2021-06-16 post 
8 JSDFRP49TG 2021-06-17 4     2021-06-16 post 
9 JSDFRP49TG 2021-06-18 5     2021-06-16 post 

[[2]]
# A tibble: 9 × 5
  sn         datetime   x     stop_date  group
  <chr>      <date>     <chr> <date>     <chr>
1 XZY12CDF   2021-12-01 3     2021-12-05 pre  
2 XZY12CDF   2021-12-02 8     2021-12-05 pre  
3 XZY12CDF   2021-12-04 4     2021-12-05 pre  
4 DFFSJ487F  2021-01-07 3     2021-01-10 pre  
5 DFFSJ487F  2021-01-08 5     2021-01-10 pre  
6 DFFSJ487F  2021-01-09 8     2021-01-10 pre  
7 JSDFRP49TG 2021-06-13 12    2021-06-16 pre  
8 JSDFRP49TG 2021-06-14 4     2021-06-16 pre  
9 JSDFRP49TG 2021-06-15 2     2021-06-16 pre  
  

Upvotes: 3

Related Questions