Reputation: 23
I have a dataset with start-time and end-time of using the service. About 1000 rows totally. I need to count the number of hours in any given interval.
Dataset:
court_number start_time end_time service 1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching
library(data.table)
dt <- data.table(court_number = c('court 2','court 3','court 1','court 5','court 5'),
start_time = c('2020-03-01 11:00:00', '2020-03-01 12:30:00', '2020-03-01 11:00:00', '2020-03-01 12:00:00', '2020-03-01 16:30:00'),
end_time = c('2020-03-01 12:30:00', '2020-03-01 13:30:00', '2020-03-01 13:00:00', '2020-03-01 16:00:00', '2020-03-01 18:30:00'),
service = c('booking','coaching','booking','booking','coaching'))
I want to count, for example, hours in the interval from "12:00" to "17:00". So, I need to create a column with hours from 12 to 17:
court_number start_time end_time service interval_12_17 1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0,5 2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1 3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1 4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4 5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0,5
I've read a lot of similar resolved issues here at stackoverflow, but due to the fact that I am newbie to R, it didn't help me =)
Upvotes: 2
Views: 45
Reputation: 887831
We convert the 'time' columns to Datetime
class, get the difftime
of pmin
of the 'Time' for 'end_time' and "17:00:00" and the pmax
of the 'start_time' and "12:00:00"
library(dplyr)
library(lubridate)
library(data.table)
dt %>%
mutate_at(vars(ends_with('time')), ymd_hms) %>%
mutate(interval_12_17 = difftime(pmin(as.ITime(end_time),
as.ITime("17:00:00")),
pmax(as.ITime(start_time),as.ITime("12:00:00")), unit = 'hour'))
# court_number start_time end_time service interval_12_17
#1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0.5 hours
#2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1.0 hours
#3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1.0 hours
#4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4.0 hours
#5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0.5 hours
Or using data.table
dt[, interval_12_17 := difftime(pmin(as.ITime(end_time), as.ITime("17:00:00")),
pmax(as.ITime(start_time),as.ITime("12:00:00")), unit = 'hour')][]
# court_number start_time end_time service interval_12_17
#1: court 2 2020-03-01 11:00:00 2020-03-01 12:30:00 booking 0.5 hours
#2: court 3 2020-03-01 12:30:00 2020-03-01 13:30:00 coaching 1.0 hours
#3: court 1 2020-03-01 11:00:00 2020-03-01 13:00:00 booking 1.0 hours
#4: court 5 2020-03-01 12:00:00 2020-03-01 16:00:00 booking 4.0 hours
#5: court 5 2020-03-01 16:30:00 2020-03-01 18:30:00 coaching 0.5 hours
Upvotes: 2