Reputation: 359
I wanted to add a new variable depending on the time range between two variables. I want times between 8:01-20:00 = day and times between 20:01-8:00 = night, and anything that overlaps both to be mixed.
I've tried to add the variable manually but trying to understand can this be done an easier way.
#Current database
id<-c("m1","m1","m1","m2","m2","m2","m3","m4","m4")
x<-c("2020-01-03 10:00:00","2020-01-03 16:00:00","2020-01-03 19:20:00","2020-01-05 10:00:00","2020-01-05 15:20:00","2020-01-05 20:50:00","2020-01-06 06:30:00","2020-01-08 06:30:00","2020-01-08 07:50:00")
start<-strptime(x,"%Y-%m-%d %H:%M:%S")
y<-c("2020-01-03 16:00:00","2020-01-03 19:20:00","2020-01-03 20:50:00","2020-01-05 15:20:00","2020-01-05 20:50:00","2020-01-05 22:00:00","2020-01-06 07:40:00","2020-01-08 07:50:00","2020-01-08 08:55:00")
end<-strptime(y,"%Y-%m-%d %H:%M:%S")
mydata<-data.frame(id,start,end)
#output
day.night<-c("day","day","mixed","day","mixed","night","night","night","mixed")
newdata<-cbind(mydata,day.night)
Edit: Apologies I forgot to add the date.
Upvotes: 0
Views: 396
Reputation: 389155
One way using dplyr
is to convert start.time
and end.time
to POSIXct
object and then compare the values at various intervals and apply labels using case_when
.
library(dplyr)
data %>%
mutate(start.time1 = as.POSIXct(start.time, format = "%H:%M"),
end.time1 = as.POSIXct(end.time, format = "%H:%M"),
day.night = case_when(
start.time1 > as.POSIXct('08:01:00', format = "%T") &
end.time1 < as.POSIXct('20:00:00', format = "%T") ~"day",
start.time1 > as.POSIXct('20:01:00', format = "%T") |
start.time1 < as.POSIXct('08:00:00', format = "%T") &
end.time1 < as.POSIXct('08:00:00', format = "%T") ~ "night",
TRUE ~ "mixed")) %>%
select(names(data), day.night)
# id start.time end.time day.night
#1 m1 10:00 16:00 day
#2 m1 16:00 19:20 day
#3 m1 19:20 20:50 mixed
#4 m2 10:00 15:20 day
#5 m2 15:20 20:50 mixed
#6 m2 20:50 22:00 night
#7 m3 06:30 07:40 night
#8 m4 06:30 07:50 night
#9 m4 07:50 08:55 mixed
EDIT
If we also have date one way would be to replace the date component in start
and end
with todays date for comparison purposes.
library(dplyr)
library(lubridate)
mydata %>%
mutate_at(vars(start, end), ymd_hms) %>%
mutate(start_hour = hour(start),
end_hour = hour(end),
day.night = case_when(start_hour >= 8 & end_hour >= 8 & end_hour < 20 ~ "day",
start_hour >= 20 & (end_hour < 8 | end_hour <= 23) |
(start_hour < 8 & end_hour < 8)~ "night",
TRUE ~ "mixed"))
# id start end day.night
#1 m1 2020-01-03 10:00:00 2020-01-03 16:00:00 day
#2 m1 2020-01-03 16:00:00 2020-01-03 19:20:00 day
#3 m1 2020-01-03 19:20:00 2020-01-03 20:50:00 mixed
#4 m2 2020-01-05 10:00:00 2020-01-05 15:20:00 day
#5 m2 2020-01-05 15:20:00 2020-01-05 20:50:00 mixed
#6 m2 2020-01-05 20:50:00 2020-01-05 22:00:00 night
#7 m3 2020-01-06 06:30:00 2020-01-06 07:40:00 night
#8 m4 2020-01-08 06:30:00 2020-01-08 07:50:00 night
#9 m4 2020-01-08 07:50:00 2020-01-08 08:55:00 mixed
Upvotes: 2