Henk Straten
Henk Straten

Reputation: 1447

How to perform a sum on POSIXT values using Dplyr?

I have the following dataset

df <- data.frame(
 courts = c("Court 1", "Court 1", "Court 2"),
 time = c("1:00 hrs", "3:00 hrs", "2:00 hrs") 
)

I now remove the space like this

df <- df %>%
 mutate(
 time_without_hrs = sub(" hrs", "", time)
)

And format the hrs as date.time

 df$time_without_hrs <- as.POSIXct(df$time_without_hrs, format="%H:%M")

However, now when I want to sum the courts by hours ....

summary_df <- df %>%
 group_by(courts) %>%
 summarize(
 sum_value1 = sum(time_without_hrs),
 )

I get the following error. Any thoughts on what goes wrong?

Error in `summarize()`:
ℹ In argument: `sum_value1 = sum(time_without_hrs)`.
ℹ In group 1: `courts = "Court 1"`.
Caused by error in `Summary.POSIXct()`:
! 'sum' not defined for "POSIXt" objects
Run `rlang::last_trace()` to see where the error occurred.

Upvotes: 1

Views: 68

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269644

Convert the time to "difftime" class and then sum. Finally convert that to numeric.

Optionally omit %>% as.numeric if a difftime object is ok as the output for the time column.

Have incorporated suggestion from comments by @ThomasIsCoding

library(dplyr)

df %>%
  summarize(time = time %>%
       as.difftime(format = "%H:%M hrs") %>%
       sum %>% 
       as.numeric, .by = courts)
##    courts time
## 1 Court 1    4
## 2 Court 2    2

Note

Input used

df <- data.frame(
 courts = c("Court 1", "Court 1", "Court 2"),
 time = c("1:00 hrs", "3:00 hrs", "2:00 hrs") 
)

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101383

In your case, what you want to sum up is the time interval (difference), not the time instance that POSIXct represents.

You can use as.difftime like below

df %>%
    summarise(sum_hrs = sum(as.difftime(time,
        format = "%H:%M hrs",
        units = "hours"
    )), .by = courts)

gives

   courts sum_hrs
1 Court 1 4 hours
2 Court 2 2 hours

Upvotes: 1

Edward
Edward

Reputation: 18683

The error message is clear: "! 'sum' not defined for "POSIXt" objects".

Try lubridate's hm and hour functions to convert the original character representation into a numeric value representing the number of whole hours.

library(lubridate)

df %>%
  mutate(time_in_hrs=hour(
    hm(time)
  )) %>%
  group_by(courts) %>%
  summarize(
    sum_value1 = sum(time_in_hrs),
  )

# A tibble: 2 x 2
  courts  sum_value1
  <chr>        <dbl>
1 Court 1          4
2 Court 2          2

Upvotes: 1

Related Questions