Chewyham
Chewyham

Reputation: 87

How to filter by multiple time ranges in R?

I have a dataframe with a DateTime column and some other variables. Below is an example of it:

library(lubridate)
df <- tibble(DateTime = ymd_hms(c("2020-04-06 10:00:00", 
                                                    "2020-04-07 12:30:00", 
                                                    "2020-04-07 14:31:00", 
                                                    "2020-05-05 17:00:00")),
                               Var1 = c(1,2,3,4),
                               var2= c("a","b","c","d"))
df
# A tibble: 4 x 3
          DateTime             Var1 var2 
          <dttm>              <dbl> <chr>
        1 2020-04-06 10:00:00     1 a    
        2 2020-04-07 12:30:00     2 b    
        3 2020-04-07 14:31:00     3 c    
        4 2020-05-05 17:00:00     4 d   

I'd like to filter rows such that it returns a dataframe with the time element in DateTime being between:

Such that it returns the following dataframe:

# A tibble: 2 x 3
  DateTime             Var1 var2 
  <dttm>              <dbl> <chr>
1 2020-04-06 10:00:00     1 a    
2 2020-04-07 14:31:00     3 c 

I know I could do DateTime >= hms("09:15:00") & DateTime <= hms("12:00:00") and so on but that seems way too complicated. Is there an easier way to do it?

Upvotes: 1

Views: 1935

Answers (3)

Edward
Edward

Reputation: 18543

Using the "POSIXt" method of cut is one option:

library(dplyr)

df %>%
  filter(cut(as.POSIXct(paste0(Sys.Date(), format(DateTime, "%H:%M:%S"))), 
             breaks=as.POSIXct(
               c("09:15:00","12:00:00","13:15:00","16:30:00","17:15:00","01:00:00"), 
               format="%H:%M:%S"), labels=FALSE) %in% c(2,4,6))

# A tibble: 2 x 3
  DateTime             Var1 var2 
  <dttm>              <dbl> <chr>
1 2020-04-06 10:00:00     1 a    
2 2020-04-07 14:31:00     3 c 

A bit clumsy, but no need for any external packages. Might have to adjust the breaks so that the ends are included, maybe go back 1 second.

Upvotes: 1

Ian Campbell
Ian Campbell

Reputation: 24770

After thinking about it some more, you might use the interval functions from lubridate.

Unfortunately, they only work on datetime objects. So you'll have to be a little hacky and add on an arbitrary date:

library(dplyr)
library(lubridate)
library(purrr)
my.intervals <- c(dmy_hms("1/1/00 09:15:00") %--% dmy_hms("1/1/00 12:00:00"),
                  dmy_hms("1/1/00 13:15:00") %--% dmy_hms("1/1/00 16:30:00"),
                  dmy_hms("1/1/00 17:15:00") %--% dmy_hms("1/1/00 23:59:59"),
                  dmy_hms("1/1/00 00:00:00") %--% dmy_hms("1/1/00 01:00:00"))

df %>%
  mutate(Time = dmy_hms(paste0("01/01/00 ",format(DateTime,"%H:%M:%S")))) %>% 
  dplyr::filter(map_lgl(Time, ~ any(.x %within% my.intervals))) %>%
  dplyr::select(-Time)
# A tibble: 2 x 3
  DateTime             Var1 var2 
  <dttm>              <dbl> <chr>
1 2020-04-06 10:00:00     1 a    
2 2020-04-07 14:31:00     3 c   

Upvotes: 2

r2evans
r2evans

Reputation: 160407

Try this:

# library(hms) # hms::as.hms
df %>%
  mutate(DT2 = hms::as.hms(DateTime, tz = "UTC")) %>%
  filter(between(DT2, hms("09:15:00"), hms("12:00:00")) |
           between(DT2, hms("13:15:00"), hms("16:30:00")) |
           DT2 > hms("17:15:00") | DT2 < hms("01:00:00"))
# # A tibble: 2 x 4
#   DateTime             Var1 var2  DT2   
#   <dttm>              <dbl> <chr> <time>
# 1 2020-04-06 10:00:00     1 a     10:00 
# 2 2020-04-07 14:31:00     3 c     14:31 

You can't really do a direct comparison of POSIXt with lubridate's Period class (apparently). Also, I used tz="UTC" since the sample data has one method for assuming time zone and the hms package must have a different assumption. You may need to play with that argument to make sure it matches what you expect from your data.

Upvotes: 1

Related Questions