Reputation: 767
I am working on a simple timesheet/time tracking app in Shiny for personal use. The app will record timestamps for when I start and stop activities. However, there are times during the day where there are natural gaps between specific tasks where you still do stuff. These in-between periods are not captured by the app, but are present in the data as "gaps" between the timestamps. Sample data looks like this (dput of the data at the end of the post):
# A tibble: 9 x 3
start end activity
<dttm> <dttm> <chr>
1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
3 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
4 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
5 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
6 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
7 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
8 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
9 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2
The gaps in the data are obvious. For example, on the 28th there is no gap between the first and the second entry (end time of the first entry is equal to the start time of the second entry). There is, however, a gap between the second entry and the third entry (the end time of the second entry differs from the third entry). We can find similar gaps for the other days in the sample data.
What I want to do is fill in these gaps with an activity called "Other", such that for each day there are no gaps between the start of the first and the end of the last entry. That is, all existing gaps are filled in. The desired output would look like this:
# A tibble: 13 x 3
start end activity
<dttm> <dttm> <chr>
1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other
4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other
7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other
9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2
The data will be created daily for the forseeable future, so potentially the solution will have to work on larger datasets and a vectorized approach is preferable. Sofar, I've been working within the tidyverse and with lubridate. I am not sure if there is something simple/easy that I've overlookd (I hope so).
The first thing I thought about was to write a loop or using a lapply type expression. This can quicly get out hand as the data grows unless I remember to always fill in or run checks and fill in data regularly (I will probably get to this part of the app eventually).
Alternatively, I started thinking about pivoting the data longer creating groups of 2 matches with start and end times for each day to work out the gaps. This could potentially be quick, but I struggled to find a good way of setting up the problem.
If it matters, the data is submitted to a local SQLite database everytime an entry is added.
Any help/input on this is much appreciated.
Sample data:
library(tidyverse)
library(lubridate)
db <- structure(list(start = structure(c(1669629600, 1669632630, 1669636860,
1669716000, 1669719031, 1669723200, 1669726800, 1669795205, 1669824185
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), end = structure(c(1669632630,
1669635545, 1669651845, 1669719030, 1669719615, 1669726800, 1669737600,
1669802400, 1669827600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
activity = c("Activity 1", "Activity 2", "Activity 2", "Activity 1",
"Activity 4", "Activity 5", "Activity 2", "Activity 1", "Activity 2"
)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))
Upvotes: 2
Views: 57
Reputation: 30539
Here is another approach using tidyverse. Add a column to indicate next row start
time. Then, filter for rows where there are gaps (end
time is less than the next row start
time). Change those rows to "Other" and revise the times for the gap. Then, use bind_rows
to add back to original data.frame.
library(tidyverse)
library(lubridate)
db %>%
group_by(day(start)) %>%
mutate(nextstart = lead(start)) %>%
ungroup() %>%
filter(end < nextstart) %>%
transmute(start = end, end = nextstart, activity = "Other") %>%
bind_rows(db) %>%
arrange(start)
Output
start end activity
<dttm> <dttm> <chr>
1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other
4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other
7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other
9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2
Upvotes: 2
Reputation: 8826
Here a way to do it:
library(dplyr)
library(lubridate)
db %>%
bind_rows(
data.frame(
start = db %>%
group_by(day(start)) %>%
filter(end != lead(start)) %>%
pull(end),
end = NA,
activity= "Other"
)
) %>%
arrange(start) %>%
mutate(end = if_else(is.na(end),lead(start),end))
# A tibble: 13 x 3
start end activity
<dttm> <dttm> <chr>
1 2022-11-28 10:00:00 2022-11-28 10:50:30 Activity 1
2 2022-11-28 10:50:30 2022-11-28 11:39:05 Activity 2
3 2022-11-28 11:39:05 2022-11-28 12:01:00 Other
4 2022-11-28 12:01:00 2022-11-28 16:10:45 Activity 2
5 2022-11-29 10:00:00 2022-11-29 10:50:30 Activity 1
6 2022-11-29 10:50:30 2022-11-29 10:50:31 Other
7 2022-11-29 10:50:31 2022-11-29 11:00:15 Activity 4
8 2022-11-29 11:00:15 2022-11-29 12:00:00 Other
9 2022-11-29 12:00:00 2022-11-29 13:00:00 Activity 5
10 2022-11-29 13:00:00 2022-11-29 16:00:00 Activity 2
11 2022-11-30 08:00:05 2022-11-30 10:00:00 Activity 1
12 2022-11-30 10:00:00 2022-11-30 16:03:05 Other
13 2022-11-30 16:03:05 2022-11-30 17:00:00 Activity 2
Upvotes: 2