edsandorf
edsandorf

Reputation: 767

Fill in gaps between irregular end and start timestamps

I am working on a simple timesheet/time tracking app in Shiny for personal use. The app will record timestamps for when I start and stop activities. However, there are times during the day where there are natural gaps between specific tasks where you still do stuff. These in-between periods are not captured by the app, but are present in the data as "gaps" between the timestamps. Sample data looks like this (dput of the data at the end of the post):

# A tibble: 9 x 3
  start               end                 activity  
  <dttm>              <dttm>              <chr>     
1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
3 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
4 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
5 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
6 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
7 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
8 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
9 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2

The gaps in the data are obvious. For example, on the 28th there is no gap between the first and the second entry (end time of the first entry is equal to the start time of the second entry). There is, however, a gap between the second entry and the third entry (the end time of the second entry differs from the third entry). We can find similar gaps for the other days in the sample data.

What I want to do is fill in these gaps with an activity called "Other", such that for each day there are no gaps between the start of the first and the end of the last entry. That is, all existing gaps are filled in. The desired output would look like this:

# A tibble: 13 x 3
   start               end                 activity  
   <dttm>              <dttm>              <chr>     
 1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
 2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
 3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other     
 4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
 5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
 6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other     
 7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
 8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other     
 9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other     
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2

The data will be created daily for the forseeable future, so potentially the solution will have to work on larger datasets and a vectorized approach is preferable. Sofar, I've been working within the tidyverse and with lubridate. I am not sure if there is something simple/easy that I've overlookd (I hope so).

The first thing I thought about was to write a loop or using a lapply type expression. This can quicly get out hand as the data grows unless I remember to always fill in or run checks and fill in data regularly (I will probably get to this part of the app eventually).

Alternatively, I started thinking about pivoting the data longer creating groups of 2 matches with start and end times for each day to work out the gaps. This could potentially be quick, but I struggled to find a good way of setting up the problem.

If it matters, the data is submitted to a local SQLite database everytime an entry is added.

Any help/input on this is much appreciated.

Sample data:

library(tidyverse)
library(lubridate)

db <- structure(list(start = structure(c(1669629600, 1669632630, 1669636860, 
1669716000, 1669719031, 1669723200, 1669726800, 1669795205, 1669824185
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), end = structure(c(1669632630, 
1669635545, 1669651845, 1669719030, 1669719615, 1669726800, 1669737600, 
1669802400, 1669827600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    activity = c("Activity 1", "Activity 2", "Activity 2", "Activity 1", 
    "Activity 4", "Activity 5", "Activity 2", "Activity 1", "Activity 2"
    )), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

Upvotes: 2

Views: 57

Answers (2)

Ben
Ben

Reputation: 30539

Here is another approach using tidyverse. Add a column to indicate next row start time. Then, filter for rows where there are gaps (end time is less than the next row start time). Change those rows to "Other" and revise the times for the gap. Then, use bind_rows to add back to original data.frame.

library(tidyverse)
library(lubridate)

db %>%
  group_by(day(start)) %>%
  mutate(nextstart = lead(start)) %>%
  ungroup() %>%
  filter(end < nextstart) %>%
  transmute(start = end, end = nextstart, activity = "Other") %>%
  bind_rows(db) %>%
  arrange(start)

Output

   start               end                 activity  
   <dttm>              <dttm>              <chr>     
 1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
 2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
 3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other     
 4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
 5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
 6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other     
 7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
 8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other     
 9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other     
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2

Upvotes: 2

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8826

Here a way to do it:

library(dplyr)
library(lubridate)

db %>% 
  bind_rows(
    data.frame(
      start = db %>% 
        group_by(day(start)) %>% 
        filter(end != lead(start)) %>% 
        pull(end),
      end = NA,
      activity= "Other"
    )
  ) %>% 
  arrange(start) %>% 
  mutate(end = if_else(is.na(end),lead(start),end))

# A tibble: 13 x 3
   start               end                 activity  
   <dttm>              <dttm>              <chr>     
 1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
 2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
 3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other     
 4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
 5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
 6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other     
 7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
 8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other     
 9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other     
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2

Upvotes: 2

Related Questions