William Chiu
William Chiu

Reputation: 450

Filter R data frame by hour of the day

I have a data frame with a datetime column. I want to know the number of rows by hour of the day. However, I care only about the rows between 8 AM and 10 PM.

The lubridate package requires us to filter hours of the day using the 24-hour convention.

library(tidyverse)
library(lubridate)

### Fake Data with Date-time ----
x <- seq.POSIXt(as.POSIXct('1999-01-01'), as.POSIXct('1999-02-01'), length.out=1000)

df <- data.frame(myDateTime = x)

### Get all rows between 8 AM and 10 PM (inclusive)

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= 8, myHour <= 22) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour) ## number of rows

Is there a way for me to use 10:00 PM rather than the integer 22?

Upvotes: 1

Views: 3783

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

You can also use base R to do this

#Extract the hour 
df$hour_day <- as.numeric(format(df$myDateTime, "%H"))

#Subset data between 08:00 AM and 10:00 PM
new_df <- df[df$hour_day >= as.integer(format(as.POSIXct("08:00 AM", 
      format = "%I:%M %p"), "%H")) & as.integer(format(as.POSIXct("10:00 PM", 
      format = "%I:%M %p"), "%H")) >= df$hour_day, ]
#Count the frequency
stack(table(new_df$hour_day))

#   values ind
#1      42   8
#2      42   9
#3      41  10
#4      42  11
#5      42  12
#6      41  13
#7      42  14
#8      41  15
#9      42  16
#10     42  17
#11     41  18
#12     42  19
#13     42  20
#14     41  21
#15     42  22

This gives the same output as the tidyverse/lubridate approach

library(tidyverse)
library(lubridate)

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")), 
         myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>%  
  count(myHour)

Upvotes: 2

William Chiu
William Chiu

Reputation: 450

You can use the ymd_hm and hour functions to do 12-hour to 24-hour conversions.

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hour(ymd_hm("2000-01-01 8:00 AM")), ## hour() ignores year, month, date
         myHour <= hour(ymd_hm("2000-01-01 10:00 PM"))) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour)

A more elegant solution.

## custom function to convert 12 hour time to 24 hour time

hourOfDay_12to24 <- function(time12hrFmt){
  out <- paste("2000-01-01", time12hrFmt)
  out <- hour(ymd_hm(out))
  out
}

df %>% 
  mutate(myHour = hour(myDateTime)) %>% 
  filter(myHour >= hourOfDay_12to24("8:00 AM"),
         myHour <= hourOfDay_12to24("10:00 PM")) %>%  ## between 8 AM and 10 PM (both inclusive)
  count(myHour)

Upvotes: 3

Related Questions