How to insert missing dates/times using R based on criteria?

Question

A data frame like below. 3 staffs have hourly readings in days, but incomplete (every staff shall have 24 readings a day).

Understand that staffs had different number of readings on the days. Now only interested in the staff with most readings in the day.

There are many days. It’s wanted to insert the missing (hourly) rows for the most ones on the days. That is, 2018-03-02 to insert only for Jack’s, 2018-03-03 only for David and 2018-03-04 only for Kate.

I tried these lines from this question (even though they fill all without differentiation) but not getting there.

How can it be done in R?

date_time <- c("2/3/2018 0:00","2/3/2018 1:00","2/3/2018 2:00","2/3/2018 3:00","2/3/2018 5:00","2/3/2018 6:00","2/3/2018 7:00","2/3/2018 8:00","2/3/2018 9:00","2/3/2018 10:00","2/3/2018 11:00","2/3/2018 12:00","2/3/2018 13:00","2/3/2018 14:00","2/3/2018 16:00","2/3/2018 17:00","2/3/2018 18:00","2/3/2018 19:00","2/3/2018 21:00","2/3/2018 22:00","2/3/2018 23:00","3/3/2018 0:00","3/3/2018 0:00","3/3/2018 1:00","3/3/2018 2:00","3/3/2018 4:00","3/3/2018 5:00","3/3/2018 7:00","3/3/2018 8:00","3/3/2018 9:00","3/3/2018 11:00","3/3/2018 12:00","3/3/2018 14:00","3/3/2018 15:00","3/3/2018 17:00","3/3/2018 18:00","3/3/2018 20:00","3/3/2018 22:00","3/3/2018 23:00","4/3/2018 0:00","4/3/2018 0:00","4/3/2018 1:00","4/3/2018 2:00","4/3/2018 3:00","4/3/2018 5:00","4/3/2018 6:00","4/3/2018 7:00","4/3/2018 8:00","4/3/2018 10:00","4/3/2018 11:00","4/3/2018 12:00","4/3/2018 14:00","4/3/2018 15:00","4/3/2018 16:00","4/3/2018 17:00","4/3/2018 19:00","4/3/2018 20:00","4/3/2018 22:00","4/3/2018 23:00")
staff <- c("Jack","Jack","Kate","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Kate","Jack","Jack","Jack","David","David","Jack","Kate","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","Jack","Kate","David","David","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Jack")
reading <- c(7.5,8.3,7,6.9,7.1,8.1,8.4,8.8,6,7.1,8.9,7.3,7.4,6.9,11.3,18.8,4.6,6.7,7.7,7.8,7,7,6.6,6.8,6.7,6.1,7.1,6.3,7.2,6,5.8,6.6,6.5,6.4,7.2,8.4,6.5,6.5,5.5,6.7,7,7.5,6.5,7.5,7.2,6.3,7.3,8,7,8.2,6.5,6.8,7.5,7,6.1,5.7,6.7,4.3,6.3)
df <- data.frame(date_time, staff, reading)

akrun · Accepted Answer

The option would be to do this separately. Create a data.table of the dates of interest and the corresponding 'staff', and get the full sequence of date time, then we rbind this with the original dataset and using a condition, we summarise the data

library(data.table)
stf <- c("Jack", "David", "Kate")
date <- as.Date(c("2018-03-02", "2018-03-03", "2018-03-04"))
df1 <- data.table(date, staff= stf)[, .(date_time = seq(as.POSIXct(paste(date, "00:00:00"), 
       tz = "GMT"),
           length.out = 24, by = "1 hour")), staff]

setDT(df)[, date_time := as.POSIXct(date_time, "%d/%m/%Y %H:%M", tz = "GMT")]
res <- rbindlist(list(df, df1), fill = TRUE)[, 
     .(reading = if(any(is.na(reading))) sum(reading, na.rm = TRUE) else reading),
         .(staff, date_time)]

table(res$staff, as.Date(res$date_time))

#         2018-03-02 2018-03-03 2018-03-04
#  David          3         24          2
#  Jack          24          1          1
#  Kate           3          1         24

head(res)
#   staff           date_time reading
#1:  Jack 2018-03-02 00:00:00     7.5
#2:  Jack 2018-03-02 01:00:00     8.3
#3:  Kate 2018-03-02 02:00:00     7.0
#4:  Jack 2018-03-02 03:00:00     6.9
#5:  Jack 2018-03-02 05:00:00     7.1
#6:  Jack 2018-03-02 06:00:00     8.1

tail(res)
#   staff           date_time reading
#1:  Kate 2018-03-04 04:00:00       0
#2:  Kate 2018-03-04 09:00:00       0
#3:  Kate 2018-03-04 13:00:00       0
#4:  Kate 2018-03-04 18:00:00       0
#5:  Kate 2018-03-04 21:00:00       0
#6:  Kate 2018-03-04 23:00:00       0

How to insert missing dates/times using R based on criteria?

Answers (2)

Related Questions