Alan
Alan

Reputation: 399

Filter df with datetime

I have a dataframe that I am trying to filter on with regards to two columns, a date and a time.

I currently have another function (get_datetime) that takes in a date with format 'YYYYMMDD' and a time 'HHMM' and returns a POSIXct object.

My filter condition is on if the datetime is within a certain number of hours (hour_diff), and this is what I currently have:

rows <- rows[abs(as.numeric(difftime(datetime, get_datetime(rows$file_date, rows$file_time), units='hours'))) <= hour_diff,]

get_datetime <- function(date, time) {
  #format date and time into POSIXct object
  year <- substr(landfall_date, 1, 4)
  month <- substr(landfall_date, 5, 6)
  day <- substr(landfall_date, 7, 8)
  hour <- substr(landfall_time, 1, nchar(landfall_time) - 2)
  minute <- substr(landfall_time, nchar(landfall_time) - 1, nchar(landfall_time))
  datetime <- as.POSIXct(paste0(year, '-', month, '-', day, ' ', hour, ':', minute, ':00'))
  return(datetime)
}

How do I pass get_datetime an individual date and time as opposed to the entire date and time columns, or is there an alternative way for me to properly filter the rows?

Here is some sample data:

enter image description here

structure(list(county = structure(1:6, .Label = c("beaufort", "bertie", "brunswick", "camden", "carteret", "chowan"), class = "factor"), file_date = c(19900724L, 19900724L, 19900725L, 19900725L, 19900726L, 19900726L), file_time = c(300L, 1200L, 1800L, 1800L, 1200L, 1800L)), class = "data.frame", row.names = c(NA, -6L))

datetime <- as.POSIXct('1990-07-25 12:00')
hour_diff <- 12

With the provided datetime above and 12 hours as the hour_diff, I would want to receive the middle 4 rows (bertie, brunswick, camden, carteret).

Upvotes: 1

Views: 54

Answers (1)

Jacky
Jacky

Reputation: 750

I recommend the package stringr and anytime to clean up your dates and times.

library(anytime)
library(stringr)
library(dplyr)
library(lubridate)

#pad your times that are less than 4 digits
df$file_time = str_pad(df$file_time,width=4,side = "left", pad= "0")

#convert your date and time to datetime 
df$new_dt = anytime(paste(df$file_date, df$file_time))

#create an hour flag
df$hour = hour(df$new_dt)

#filter to get your result:
df %>% filter( hour == '12')
    county file_date file_time              new_dt hour
1   bertie  19900724      1200 1990-07-24 12:00:00   12
2 carteret  19900726      1200 1990-07-26 12:00:00   12

OR if you wanted the range of datetimes between 1990-07-24 12:00:00 and 1990-07-26 12:00:00

df %>% filter(new_dt >= '1990-07-24 12:00:00' & new_dt <= '1990-07-26 12:00:00')
     county file_date file_time              new_dt hour
1    bertie  19900724      1200 1990-07-24 12:00:00   12
2 brunswick  19900725      1800 1990-07-25 18:00:00   18
3    camden  19900725      1800 1990-07-25 18:00:00   18
4  carteret  19900726      1200 1990-07-26 12:00:00   12

Upvotes: 1

Related Questions