Reputation: 9348
A data frame like below. 3 staffs have hourly readings in days, but incomplete (every staff shall have 24 readings a day).
Understand that staffs had different number of readings on the days. Now only interested in the staff with most readings in the day.
There are many days. It’s wanted to insert the missing (hourly) rows for the most ones on the days. That is, 2018-03-02 to insert only for Jack’s, 2018-03-03 only for David and 2018-03-04 only for Kate.
I tried these lines from this question (even though they fill all without differentiation) but not getting there.
How can it be done in R?
date_time <- c("2/3/2018 0:00","2/3/2018 1:00","2/3/2018 2:00","2/3/2018 3:00","2/3/2018 5:00","2/3/2018 6:00","2/3/2018 7:00","2/3/2018 8:00","2/3/2018 9:00","2/3/2018 10:00","2/3/2018 11:00","2/3/2018 12:00","2/3/2018 13:00","2/3/2018 14:00","2/3/2018 16:00","2/3/2018 17:00","2/3/2018 18:00","2/3/2018 19:00","2/3/2018 21:00","2/3/2018 22:00","2/3/2018 23:00","3/3/2018 0:00","3/3/2018 0:00","3/3/2018 1:00","3/3/2018 2:00","3/3/2018 4:00","3/3/2018 5:00","3/3/2018 7:00","3/3/2018 8:00","3/3/2018 9:00","3/3/2018 11:00","3/3/2018 12:00","3/3/2018 14:00","3/3/2018 15:00","3/3/2018 17:00","3/3/2018 18:00","3/3/2018 20:00","3/3/2018 22:00","3/3/2018 23:00","4/3/2018 0:00","4/3/2018 0:00","4/3/2018 1:00","4/3/2018 2:00","4/3/2018 3:00","4/3/2018 5:00","4/3/2018 6:00","4/3/2018 7:00","4/3/2018 8:00","4/3/2018 10:00","4/3/2018 11:00","4/3/2018 12:00","4/3/2018 14:00","4/3/2018 15:00","4/3/2018 16:00","4/3/2018 17:00","4/3/2018 19:00","4/3/2018 20:00","4/3/2018 22:00","4/3/2018 23:00")
staff <- c("Jack","Jack","Kate","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Jack","Kate","Jack","Jack","Jack","David","David","Jack","Kate","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","David","Jack","Kate","David","David","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Kate","Jack")
reading <- c(7.5,8.3,7,6.9,7.1,8.1,8.4,8.8,6,7.1,8.9,7.3,7.4,6.9,11.3,18.8,4.6,6.7,7.7,7.8,7,7,6.6,6.8,6.7,6.1,7.1,6.3,7.2,6,5.8,6.6,6.5,6.4,7.2,8.4,6.5,6.5,5.5,6.7,7,7.5,6.5,7.5,7.2,6.3,7.3,8,7,8.2,6.5,6.8,7.5,7,6.1,5.7,6.7,4.3,6.3)
df <- data.frame(date_time, staff, reading)
Upvotes: 1
Views: 959
Reputation: 887138
The option would be to do this separately. Create a data.table
of the dates of interest and the corresponding 'staff', and get the full sequence of date time, then we rbind
this with the original dataset and using a condition, we summarise the data
library(data.table)
stf <- c("Jack", "David", "Kate")
date <- as.Date(c("2018-03-02", "2018-03-03", "2018-03-04"))
df1 <- data.table(date, staff= stf)[, .(date_time = seq(as.POSIXct(paste(date, "00:00:00"),
tz = "GMT"),
length.out = 24, by = "1 hour")), staff]
setDT(df)[, date_time := as.POSIXct(date_time, "%d/%m/%Y %H:%M", tz = "GMT")]
res <- rbindlist(list(df, df1), fill = TRUE)[,
.(reading = if(any(is.na(reading))) sum(reading, na.rm = TRUE) else reading),
.(staff, date_time)]
table(res$staff, as.Date(res$date_time))
# 2018-03-02 2018-03-03 2018-03-04
# David 3 24 2
# Jack 24 1 1
# Kate 3 1 24
head(res)
# staff date_time reading
#1: Jack 2018-03-02 00:00:00 7.5
#2: Jack 2018-03-02 01:00:00 8.3
#3: Kate 2018-03-02 02:00:00 7.0
#4: Jack 2018-03-02 03:00:00 6.9
#5: Jack 2018-03-02 05:00:00 7.1
#6: Jack 2018-03-02 06:00:00 8.1
tail(res)
# staff date_time reading
#1: Kate 2018-03-04 04:00:00 0
#2: Kate 2018-03-04 09:00:00 0
#3: Kate 2018-03-04 13:00:00 0
#4: Kate 2018-03-04 18:00:00 0
#5: Kate 2018-03-04 21:00:00 0
#6: Kate 2018-03-04 23:00:00 0
Upvotes: 1
Reputation: 5017
Try this code:
Identify each daily hour and all staff members
date_h<-seq(as.POSIXlt(min(date_time),format="%d/%m/%Y %H:%M"),as.POSIXlt(max(date_time),format="%d/%m/%Y %H:%M"),by=60*60)
staff_u<-unique(staff)
comb<-expand.grid(staff_u,date_h)
colnames(comb)<-c("staff","date_time")
Uniform date format in df
df$date_time<-as.POSIXlt(df$date_time,format="%d/%m/%Y %H:%M")
Merge information
out<-merge(comb,df,all.x=T)
Your output:
head(out)
staff date_time reading
1 Jack 2018-03-02 00:00:00 7.5
2 Jack 2018-03-02 01:00:00 8.3
3 Jack 2018-03-02 02:00:00 NA
4 Jack 2018-03-02 03:00:00 6.9
5 Jack 2018-03-02 04:00:00 NA
6 Jack 2018-03-02 05:00:00 7.1
Upvotes: 1