Reputation: 21
I would like to know how to sum the total hours by day using a condiction that only a time assigned with an X in the third column should be sum. I have this data.frame:
Date | Time | Run |
---|---|---|
17/04/12 | 00:10:00 | x |
17/04/12 | 00:19:00 | x |
17/04/12 | 00:25:00 | x |
17/04/12 | 00:29:00 | x |
17/04/12 | 00:25:00 | |
17/04/12 | 00:29:00 | x |
17/04/12 | 00:30:00 | x |
18/04/12 | 00:10:00 | |
18/04/12 | 00:14:00 | x |
18/04/12 | 00:20:00 | x |
Result:
Date | Total_Time |
---|---|
17/04/12 | 00:20:00 |
18/04/12 | 00:06:00 |
I tried using lubridate and tidyverse packages, but I have had no success.
Thanks
Upvotes: 2
Views: 497
Reputation: 10637
We can use run length encoding (rle) to divide the timepoints into leaps. Then, we can sum up the durations of the leaps:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
#' Run length encoding to output leap numbers
#' AAABBA => 111223
encode_leap <- function(x) {
rle <- rle(x)
rle$values <- rle$values %>%
length() %>%
seq()
inverse.rle(rle)
}
data <- tribble(
~Date, ~Time, ~Run,
"17/04/12", "00:10:00", "x",
"17/04/12", "00:19:00", "x",
"17/04/12", "00:25:00", "x",
"17/04/12", "00:29:00", "x",
"17/04/12", "00:25:00", NA,
"17/04/12", "00:29:00", "x",
"17/04/12", "00:30:00", "x",
"18/04/12", "00:10:00", NA,
"18/04/12", "00:14:00", "x",
"18/04/12", "00:20:00", "x"
)
data
#> # A tibble: 10 x 3
#> Date Time Run
#> <chr> <chr> <chr>
#> 1 17/04/12 00:10:00 x
#> 2 17/04/12 00:19:00 x
#> 3 17/04/12 00:25:00 x
#> 4 17/04/12 00:29:00 x
#> 5 17/04/12 00:25:00 <NA>
#> 6 17/04/12 00:29:00 x
#> 7 17/04/12 00:30:00 x
#> 8 18/04/12 00:10:00 <NA>
#> 9 18/04/12 00:14:00 x
#> 10 18/04/12 00:20:00 x
data %>%
mutate(date = Date) %>%
# make year to 4 digits
unite(Date, Time, col = "datetime", sep = "00 - ") %>%
mutate(
date,
datetime = datetime %>% parse_datetime(format = "%d/%M/%Y - %H:%M:%S"),
leap = encode_leap(.$Run)
) %>%
group_by(date, leap) %>%
summarise(duration = max(datetime) - min(datetime)) %>%
group_by(date) %>%
summarise(duration = sum(duration, na.rm = TRUE))
#> `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
#> # A tibble: 2 x 2
#> date duration
#> <chr> <drtn>
#> 1 17/04/12 1200 secs
#> 2 18/04/12 360 secs
Created on 2021-10-07 by the reprex package (v2.0.1)
Upvotes: 2
Reputation: 8826
I am not sure if this is what you want to do...
library(lubridate)
library(dplyr)
df %>%
mutate(
Date = dmy(Date),
Time = hms(Time),
# minute(Time)
) %>%
filter(Run == "x") %>%
group_by(Date) %>%
summarise(Total_Time = seconds_to_period(sum(period_to_seconds(Time))))
# A tibble: 2 x 2
Date Total_Time
<date> <Period>
1 2012-04-17 1H 39M 0S
2 2012-04-18 34M 0S
Upvotes: 1