elfeu
elfeu

Reputation: 21

Sum total hours by date in R

I would like to know how to sum the total hours by day using a condiction that only a time assigned with an X in the third column should be sum. I have this data.frame:

Date Time Run
17/04/12 00:10:00 x
17/04/12 00:19:00 x
17/04/12 00:25:00 x
17/04/12 00:29:00 x
17/04/12 00:25:00
17/04/12 00:29:00 x
17/04/12 00:30:00 x
18/04/12 00:10:00
18/04/12 00:14:00 x
18/04/12 00:20:00 x

Result:

Date Total_Time
17/04/12 00:20:00
18/04/12 00:06:00

I tried using lubridate and tidyverse packages, but I have had no success.

Thanks

Upvotes: 2

Views: 497

Answers (2)

danlooo
danlooo

Reputation: 10637

We can use run length encoding (rle) to divide the timepoints into leaps. Then, we can sum up the durations of the leaps:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

#' Run length encoding to output leap numbers
#' AAABBA => 111223
encode_leap <- function(x) {
  rle <- rle(x)
  rle$values <- rle$values %>%
    length() %>%
    seq()
  inverse.rle(rle)
}

data <- tribble(
  ~Date, ~Time, ~Run,
  "17/04/12", "00:10:00", "x",
  "17/04/12", "00:19:00", "x",
  "17/04/12", "00:25:00", "x",
  "17/04/12", "00:29:00", "x",
  "17/04/12", "00:25:00", NA,
  "17/04/12", "00:29:00", "x",
  "17/04/12", "00:30:00", "x",
  "18/04/12", "00:10:00", NA,
  "18/04/12", "00:14:00", "x",
  "18/04/12", "00:20:00", "x"
)
data
#> # A tibble: 10 x 3
#>    Date     Time     Run  
#>    <chr>    <chr>    <chr>
#>  1 17/04/12 00:10:00 x    
#>  2 17/04/12 00:19:00 x    
#>  3 17/04/12 00:25:00 x    
#>  4 17/04/12 00:29:00 x    
#>  5 17/04/12 00:25:00 <NA> 
#>  6 17/04/12 00:29:00 x    
#>  7 17/04/12 00:30:00 x    
#>  8 18/04/12 00:10:00 <NA> 
#>  9 18/04/12 00:14:00 x    
#> 10 18/04/12 00:20:00 x

data %>%
  mutate(date = Date) %>%
  # make year to 4 digits
  unite(Date, Time, col = "datetime", sep = "00 - ") %>%
  mutate(
    date,
    datetime = datetime %>% parse_datetime(format = "%d/%M/%Y - %H:%M:%S"),
    leap = encode_leap(.$Run)
  ) %>%
  group_by(date, leap) %>%
  summarise(duration = max(datetime) - min(datetime)) %>%
  group_by(date) %>%
  summarise(duration = sum(duration, na.rm = TRUE))
#> `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
#> # A tibble: 2 x 2
#>   date     duration 
#>   <chr>    <drtn>   
#> 1 17/04/12 1200 secs
#> 2 18/04/12  360 secs

Created on 2021-10-07 by the reprex package (v2.0.1)

Upvotes: 2

Vin&#237;cius F&#233;lix
Vin&#237;cius F&#233;lix

Reputation: 8826

I am not sure if this is what you want to do...

library(lubridate)
library(dplyr)

df %>% 
  mutate(
    Date = dmy(Date),
    Time = hms(Time),
    # minute(Time)
  ) %>% 
  filter(Run == "x") %>% 
  group_by(Date) %>% 
  summarise(Total_Time = seconds_to_period(sum(period_to_seconds(Time))))

# A tibble: 2 x 2
  Date       Total_Time
  <date>     <Period>  
1 2012-04-17 1H 39M 0S 
2 2012-04-18 34M 0S 

Upvotes: 1

Related Questions