timtamfan12
timtamfan12

Reputation: 7

Mutate dataframe based on day and night shifts in R

My goal is to mutate a dataframe by day and night shifts where a day shift is defined as 6:45-18:45 and a night shift is from 18:45-6:45. My issue is that I cannot manage to segment night shifts correctly as they occur over 2 different dates. My desired outcome is as follows

 date        time        shift
    17/08/2018  23:40:00 Night 1
    17/08/2018  23:56:00 Night 1
    18/08/2018  00:33:00 Night 1
    18/08/2018  04:02:00 Night 1
    18/08/2018  08:03:00 Day 1
    18/08/2018  12:25:00 Day 1
    18/08/2018  22:30:00 Night 2

However my code produces

 date        time        shift
    17/08/2018  23:40:00 Night 1
    17/08/2018  23:56:00 Night 1
    18/08/2018  00:33:00 Night 2
    18/08/2018  04:02:00 Night 2
    18/08/2018  08:03:00 Day 1
    18/08/2018  12:25:00 Day 1
    18/08/2018  22:30:00 Night 2

The code I have used is

am_shift_start = as.numeric(lubridate::hms("06:45:00"))
am_shift_end = as.numeric(lubridate::hms("18:45:00"))
merged_csv$DATE = as.Date(merged_csv$DATE, format = "%m%d%Y")
merged_csv = merged_csv %>%
  group_by(DATE) %>%
  mutate(shift = if_else((as.numeric(TIME) >= am_shift_end |
                            as.numeric(TIME) <= am_shift_start), "Night shift", "Day_shift")) %>%
  arrange(DATE, TIME)

where merged_csv is just the each daily csv of a given month merged into one. I tried to improve on the code naively with a forloop, but the ID counter doesnt work due to the if statement

static_id = 1
merged_csv$unique_shift = c(1,rep(-99, length(merged_csv[[1]])-1))
temp_date = merged_csv$DATE[1]
for(i in 2:length(merged_csv[[1]])){
  if(merged_csv$DATE[i] != temp_date){
    temp_date = merged_csv$DATE[i]
  }
#new shift as we are in a day shift or we are starting the next night shift >6:45pm
  if(merged_csv$shift[i] != merged_csv$shift[i-1] | (as.numeric(merged_csv$TIME[i]) > am_shift_end & merged_csv$shift[i] == "Night shift")){
    static_id = static_id + 1
  }
  merged_csv$unique_shift[i] = static_id
}

Is there any easier way to go about this (a working method even)? Thanks.

EDIT: adding the example data as a dataframe

merged_csv = data.frame(
  "date" =   c("17/08/2018",  "17/08/2018" , "18/08/2018", "18/08/2018",  "18/08/2018", "18/08/2018",  "18/08/2018"),
  "time" = c("23:40:00", "23:56:00", "00:33:00", "04:02:00", "08:03:00", "12:25:00", "22:30:00")
)
merged_csv$date = as.Date(merged_csv$date, format = "%d/%m/%Y")
merged_csv$time = lubridate::hms(merged_csv$time) #

Upvotes: 0

Views: 331

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389155

Here's one option :

library(dplyr)
library(lubridate)

df %>%
  tidyr::unite(datetime, date, time, sep = ' ') %>%
  arrange(datetime) %>%
  mutate(datetime = dmy_hms(datetime), 
         hour = hour(datetime), 
         mins = minute(datetime),
         shift = case_when(hour == 6 & mins >= 45 |
                                  between(hour, 7, 17) | 
                                  hour == 18 & mins <= 45 ~ 'Day', 
                                  TRUE ~ 'Night'), 
         day = data.table::rleid(shift)) %>%
  group_by(shift) %>%
  transmute(datetime, 
            result = paste(shift, match(day, unique(day)))) %>%
  ungroup() %>%
  select(-shift)

#  datetime            result 
#  <dttm>              <chr>  
#1 2020-08-17 23:40:00 Night 1
#2 2018-08-17 23:56:00 Night 1
#3 2018-08-18 00:33:00 Night 1
#4 2018-08-18 04:02:00 Night 1
#5 2018-08-18 08:03:00 Day 1  
#6 2018-08-18 12:25:00 Day 1  
#7 2018-08-18 22:30:00 Night 2

Combine date and time column, change them to POSIXct, extract hour and mins from it. Define 'day' and 'night' shifts based on required range of time and create an incremental sequence with match and unique.

data

df <- structure(list(date = c("17/08/2020", "17/08/2018", "18/08/2018", 
"18/08/2018", "18/08/2018", "18/08/2018", "18/08/2018"), time = c("23:40:00", 
"23:56:00", "00:33:00", "04:02:00", "08:03:00", "12:25:00", "22:30:00"
)), class = "data.frame", row.names = c(NA, -7L))

Upvotes: 0

r2evans
r2evans

Reputation: 160607

library(dplyr)
library(lubridate)

merged_csv %>%
  mutate(
    shift_i = findInterval(time, hms(c("00:00:00", "08:00:00", "18:30:00", "24:00:01"))),
    shift = paste(if_else(shift_i == 2L, "Day", "Night"),
                  date - min(date) + (shift_i == 3L))
  )
#         date       time shift_i   shift
# 1 2018-08-17 23H 40M 0S       3 Night 1
# 2 2018-08-17 23H 56M 0S       3 Night 1
# 3 2018-08-18     33M 0S       1 Night 1
# 4 2018-08-18   4H 2M 0S       1 Night 1
# 5 2018-08-18   8H 3M 0S       2   Day 1
# 6 2018-08-18 12H 25M 0S       2   Day 1
# 7 2018-08-18 22H 30M 0S       3 Night 2

The point is to differentiate the "shifts" into not just day and night, but night-before-midnight and night-after-midnight.

Upvotes: 1

Related Questions