Reputation: 499
For each customer_ID I want to insert their missing time of half hour duration for each date with corresponding duration_1 and duration_2 with 0 values.
date id Time duration_1 duration_2
2021-06-14 1001 10:31-11:00 10 15
2021-06-14 1001 11:01-11:30 NA 28
2021-06-14 1001 11:31-12:00 NA 30
2021-06-14 1001 16:01-16:30 12 20
2021-06-14 1002 14:01-14:30 25 50
2021-06-14 1002 14:31-15:00 NA 60
2021-06-15 1001 18:01-18:30 20 35
2021-06-15 1001 18:31-19:00 28 40
For e.g. id 1001 has four rows for 14th June with duration_1 and duration_2 for time 10:31-11:00, 11:01-11:30 and so on. Time here is split with half hour duration. I want to insert the remaining time starting from 2:00am to 26.00am for each id each date. Time starts from 2.01 AM and goes upto 26:00 with half hour slot. i.e. after insertion each Id will have 48 records of half hour split for each date as show below
date id Time duration_1 duration_2
2021-06-14 1001 2:01-2:30 0 0
2021-06-14 1001 2:31-3:00 0 0
2021-06-14 1001 3:01-3:30 0 0
2021-06-14 1001 3:31-4:00 0 0
---- -- ---- - -
---- -- ---- - -
2021-06-15 1001 24:01-24:30 0 0
2021-06-15 1001 24:31-25:00 0 0
2021-06-15 1001 25:01-25:30 0 0
2021-06-15 1001 25:31-26:00 0 0
I got some clue here: Insert rows for missing weeks in r but not able to replicate the same logic here. Any help is indeed appreciated!
Upvotes: 0
Views: 47
Reputation: 388982
You can create a vector that you want to include for every date
-
vec <- sprintf('%d:%02d-%d:%02d', rep(2:25, each = 2), c(1, 31), c(2, rep(3:25, each = 2), 26), c(30, 0))
vec
# [1] "2:01-2:30" "2:31-3:00" "3:01-3:30" "3:31-4:00" "4:01-4:30"
# [6] "4:31-5:00" "5:01-5:30" "5:31-6:00" "6:01-6:30" "6:31-7:00"
#[11] "7:01-7:30" "7:31-8:00" "8:01-8:30" "8:31-9:00" "9:01-9:30"
#[16] "9:31-10:00" "10:01-10:30" "10:31-11:00" "11:01-11:30" "11:31-12:00"
#[21] "12:01-12:30" "12:31-13:00" "13:01-13:30" "13:31-14:00" "14:01-14:30"
#[26] "14:31-15:00" "15:01-15:30" "15:31-16:00" "16:01-16:30" "16:31-17:00"
#[31] "17:01-17:30" "17:31-18:00" "18:01-18:30" "18:31-19:00" "19:01-19:30"
#[36] "19:31-20:00" "20:01-20:30" "20:31-21:00" "21:01-21:30" "21:31-22:00"
#[41] "22:01-22:30" "22:31-23:00" "23:01-23:30" "23:31-24:00" "24:01-24:30"
#[46] "24:31-25:00" "25:01-25:30" "25:31-26:00"
#Convert to factor
vec <- factor(vec, vec)
and then you can use tidyr::complete
-
tidyr::complete(df, date, id, Time = vec,
fill = list(duration_1 = 0, duration_2 = 0))
# date id Time duration_1 duration_2
# <chr> <int> <chr> <dbl> <dbl>
# 1 2021-06-14 1001 10:01-10:30 0 0
# 2 2021-06-14 1001 10:31-11:00 10 15
# 3 2021-06-14 1001 11:01-11:30 0 28
# 4 2021-06-14 1001 11:31-12:00 0 30
# 5 2021-06-14 1001 12:01-12:30 0 0
# 6 2021-06-14 1001 12:31-13:00 0 0
# 7 2021-06-14 1001 13:01-13:30 0 0
# 8 2021-06-14 1001 13:31-14:00 0 0
# 9 2021-06-14 1001 14:01-14:30 0 0
#10 2021-06-14 1001 14:31-15:00 0 0
# … with 182 more rows
Upvotes: 1