joker21
joker21

Reputation: 339

How to find the accumulated time between two

I have a two datatables like below:

DT1

  id            start_time             end_time
 604        2017-08-10 18:44:14    2017-08-11 19:33:17
 604        2017-08-10 20:38:20    2017-08-11 20:44:44
 604        2017-08-10 20:54:26    2017-08-11 20:58:48
 604        2017-08-10 21:35:50    2017-08-11 22:03:14
 604        2017-08-10 22:05:42    2017-08-11 22:17:12

.......

DT2

 id             t1                            t2
 604        2017-08-10 18:40:14    2017-08-11 18:44:14
 604        2017-08-10 18:44:14    2017-08-11 18:47:14
 604        2017-08-10 19:44:14    2017-08-11 19:47:14
 604        2017-08-10 20:30:14    2017-08-11 20:42:20
 604        2017-08-10 21:44:14    2017-08-11 21:49:14
 604        2017-08-10 22:44:14    2017-08-11 22:48:14

......

From these two I want to identify the accumulation of the DT2 rows lying inbetween the range of DT!.For example row 1 is starting from 2017-08-10 18:44:14 to 2017-08-11 19:33:17 So I want to take the accumulation of time in DT between that period and attach as column fro eah row in DT1 for example the first row would be like

   id            start_time             end_time           durationFromDT2
   604        2017-08-10 18:44:14    2017-08-11 19:33:17       420
   604        2017-08-10 20:38:20    2017-08-11 20:44:44       240

The 420 is because of (2017-08-11 18:44:14-2017-08-10 18:40:14)+(2017-08-11 18:47:14-2017-08-10 18:44:14) .If there are no occurences in DT2 in the time period then I would like it to be 0.Like wise I have to group by many id's as well.

The 240 because of the time is (2017-08-11 20:42:20-2017-08-10 20:38:20)

So it's basically the coverage of DT2 for DT1 time period I tried looping through the rows but did not go well.Instead I am looking for any dplyr or data table solutions.Because looping was not working well

Any help is appreciated.

Upvotes: 1

Views: 85

Answers (1)

leerssej
leerssej

Reputation: 14958

Assuming the rectifications suggested in the comments are correct, please find a dplyr solution below:

merge(DT1, DT2, by = "id", all = TRUE) %>%
    filter(t2 >= start_time, t1 <= end_time) %>% 
    mutate(t1_adj = if_else(start_time > t1, start_time, t1),
           t2_adj = if_else(end_time < t2, end_time, t2),
           difftime = difftime(t2_adj, t1_adj, units = "secs")) %>% 
    group_by(id, start_time, end_time) %>% 
    summarize(durationFromDT2 = sum(difftime)) %>% 
    right_join(DT1) %>% 
    mutate(durationFromDT2 = coalesce(durationFromDT2, 0))
  • cross join all records between each of the tables

  • filter down the cross join to contain just those DT2 times that have a timepoint within the DT1 interval.

  • mutate _adjusted columns to force the t1 and t2 times to remain constrained between DT1 start and end timepoints, and then calculate the time intervals.

  • group and summarize the durations by DT1 records.

  • To list all of the NULL records in the final result, right_join the DT1 table back on again.

  • Replace the NAs with 0's.

The result looks like this:

# A tibble: 5 x 4
# Groups:   id, start_time [5]
     id          start_time            end_time durationFromDT2
  <int>              <dttm>              <dttm>          <time>
1   604 2017-08-10 18:44:14 2017-08-10 19:33:17        180 secs
2   604 2017-08-10 20:38:20 2017-08-10 20:44:44        240 secs
3   604 2017-08-10 20:54:26 2017-08-10 20:58:48          0 secs
4   604 2017-08-10 21:35:50 2017-08-10 22:03:14        300 secs
5   604 2017-08-10 22:05:42 2017-08-10 22:17:12          0 secs

Reproducible (rectified) sample dataframes from the OP are below:

library(lubridate)
DT1 <- 
    read.table(text = "
                        id         start_date start_time  end_date   end_time
                        604        2017-08-10 18:44:14    2017-08-10 19:33:17
                        604        2017-08-10 20:38:20    2017-08-10 20:44:44
                        604        2017-08-10 20:54:26    2017-08-10 20:58:48
                        604        2017-08-10 21:35:50    2017-08-10 22:03:14
                        604        2017-08-10 22:05:42    2017-08-10 22:17:12
                      ", header = TRUE, stringsAsFactors = FALSE) %>% 
    mutate(start_time = ymd_hms(paste(start_date, start_time)),
           end_time   = ymd_hms(paste(end_date, end_time))) %>% 
    select(-c(start_date, end_date))

DT2 <- 
    read.table(text = "
                         id         d1         t1          d2         t2
                         604        2017-08-10 18:40:14    2017-08-10 18:44:14
                         604        2017-08-10 18:44:14    2017-08-10 18:47:14
                         604        2017-08-10 19:44:14    2017-08-10 19:47:14
                         604        2017-08-10 20:30:14    2017-08-10 20:42:20
                         604        2017-08-10 21:44:14    2017-08-10 21:49:14
                         604        2017-08-10 22:44:14    2017-08-10 22:48:14
                      ", header = TRUE, stringsAsFactors = FALSE) %>% 
    mutate(t1 = ymd_hms(paste(d1,t1)),
           t2 = ymd_hms(paste(d2,t2)),
           ) %>% 
    select(-c(d1, d2))

Upvotes: 2

Related Questions