Pete
Pete

Reputation: 656

Subset dataframe using date/time factor variable

I expect this is a repeat questions, but I have spent many hours now trying to find a solution, and would be very grateful for some help.

I have a the variable timestamp in a data frame, currently stored as a factor. timestamp is date and time in the format dd/mm/yyyy hh:mm:ss:ssssss

I would like to be able to subset the dataframe using the timestamp variable taking for instance all the rows between 09/10/2017 00:02:00 and 09/10/2017 00:06:00.

I have tried converting to an ordered factor, and to POSIXlt to help with the subsetting but had no success.

Thank you in advance for any help

df <- data.frame(timestamp=c("09/10/2017 00:00:00:000000", "09/10/2017 00:01:00:000000", "09/10/2017 00:02:00:000000", 
                 "09/10/2017 00:03:00:000000", "09/10/2017 00:04:00:000000", "09/10/2017 00:05:00:000000",
                 "09/10/2017 00:06:00:000000", "09/10/2017 00:07:00:000000", "09/10/2017 00:08:00:000000", 
                 "09/10/2017 00:09:00:000000", "09/10/2017 00:10:00:000000", "09/10/2017 00:00:00:000000", 
                 "09/10/2017 00:01:00:000000", "09/10/2017 00:02:00:000000", "09/10/2017 00:03:00:000000", 
                 "09/10/2017 00:04:00:000000", "09/10/2017 00:05:00:000000", "09/10/2017 00:06:00:000000", 
                 "09/10/2017 00:07:00:000000", "09/10/2017 00:08:00:000000", "09/10/2017 00:09:00:000000", 
                 "09/10/2017 00:10:00:000000"), b=c (1:22))

Upvotes: 1

Views: 159

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

Here is a solution using lubridate

require(lubridate);

# Convert timestamps to POSIXct time&date
df$timestamp <- dmy_hms(gsub(":000000", "", df$timestamp));

# These are your query start/stop dates&times
start <- "09/10/2017 00:02:00";
stop <- "09/10/2017 00:06:00";
interval <- interval(dmy_hms(start), dmy_hms(stop));

# Return entries that fall within query interval
df[df$timestamp %within% interval, ];
#        timestamp  b
#3  2017-10-09 00:02:00  3
#4  2017-10-09 00:03:00  4
#5  2017-10-09 00:04:00  5
#6  2017-10-09 00:05:00  6
#7  2017-10-09 00:06:00  7
#14 2017-10-09 00:02:00 14
#15 2017-10-09 00:03:00 15
#16 2017-10-09 00:04:00 16
#17 2017-10-09 00:05:00 17
#18 2017-10-09 00:06:00 18

Or use subset(df, timestamp %within% interval) to give the same result. Best to wrap this in a function for more general use.

Upvotes: 1

Related Questions