Dmytro Fedoriuk
Dmytro Fedoriuk

Reputation: 351

Fill NA by last observed value from other column, modified by adding some constant

I have start of some process, end of it and process duration.

        process_start            process_end    hourly_process_duration
  2019-01-01 00:00:00    2019-01-01 12:00:00                         12
  2019-01-01 12:00:00    2019-01-01 13:00:00                          1
                   NA                     NA                         11
                   NA                     NA                         15 
  2019-01-02 15:00:00    2019-01-02 18:00:00                          3

I always have hourly_process_duration. Processes are continuous - when one process ends the next one begins.

I need to replace NA correctly. Like in the example:

        process_start            process_end    hourly_process_duration
  2019-01-01 00:00:00    2019-01-01 12:00:00                         12
  2019-01-01 12:00:00    2019-01-01 13:00:00                          1
  2019-01-01 13:00:00    2019-01-02 00:00:00                         11
  2019-01-02 00:00:00    2019-01-02 15:00:00                         15 
  2019-01-02 15:00:00    2019-01-02 18:00:00                          3

Upvotes: 1

Views: 23

Answers (1)

akrun
akrun

Reputation: 887068

Here is one option to fill the missing date time

library(dplyr)
library(lubridate)
df1 %>%
   mutate(process_start = coalesce(process_start, lag(process_end)), 
          process_end = coalesce(process_end, lead(process_start))) %>% 
   mutate_at(vars(process_start, process_end), ymd_hms) %>% 
   mutate_at(vars(process_start, process_end), 
     list(~ replace(., is.na(.), floor_date(.[which(is.na(.))+1], "day"))))
#        process_start         process_end hourly_process_duration
#1 2019-01-01 00:00:00 2019-01-01 12:00:00                      12
#2 2019-01-01 12:00:00 2019-01-01 13:00:00                       1
#3 2019-01-01 13:00:00 2019-01-02 00:00:00                      11
#4 2019-01-02 00:00:00 2019-01-02 15:00:00                      15
#5 2019-01-02 15:00:00 2019-01-02 18:00:00                       3

data

df1 <- structure(list(process_start = c("2019-01-01 00:00:00", 
    "2019-01-01 12:00:00", 
NA, NA, "2019-01-02 15:00:00"), process_end = c("2019-01-01 12:00:00", 
"2019-01-01 13:00:00", NA, NA, "2019-01-02 18:00:00"), 
hourly_process_duration = c(12L, 
1L, 11L, 15L, 3L)), class = "data.frame", row.names = c(NA, -5L
))

Upvotes: 1

Related Questions