AEF
AEF

Reputation: 5650

Error in dplyr::summarise when working with datetimes and lubridate::dseconds

I have a tibble representing log messages. It has (among others) two columns:

What I now want to do is to find the start time, the end time and the duration of each logfile (identified by FileCreationDateTime). I think (or thought) this can be done with the following code:

file_durations <- 
  logMessages%>%
  group_by(FileCreationDateTime) %>% 
  summarise(start = min(EventDateTime),
            end = max(EventDateTime),
            duration = dseconds(end - start))

The code itself seems to run without error, i can however neither print the result nor access it (at least not column "duration") as it returns the error

Error in sprintf("%ds (~%s %ss)", x, x2, unit, "s)") : 
  invalid format '%d'; use format %f, %e, %g or %a for numeric objects

Investigating, I found that the error seems to depend on the exact values of the datetimes. I have put together a MWE with two tibbles. The two tibbles differ only in one value. One works, while the other doesn't. I have no idea what could cause the error. Can someone enlighten me?

The human readable tibbles:

> working
# A tibble: 2 × 2
            EventDateTime FileCreationDateTime
                   <dttm>               <dttm>
1 2016-11-24 16:16:44.986  2016-11-24 16:16:46
2 2016-11-24 16:17:43.282  2016-11-24 16:16:46

> broken
# A tibble: 2 × 2
            EventDateTime FileCreationDateTime
                   <dttm>               <dttm>
1 2016-11-24 16:16:44.986  2016-11-24 16:16:46
2 2016-11-24 16:18:31.971  2016-11-24 16:16:46

The complete MWE:

library(tidyverse)
library(lubridate)

options(digits.secs = 6, digits = 6)

working <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004263.283),
                                                    class = c("POSIXct", "POSIXt"),
                                                    tzone = "UTC"),
                          FileCreationDateTime = structure(c(1480000606, 1480000606),
                                                           class = c("POSIXct", "POSIXt"),
                                                           tzone = "Europe/Vienna")),
                     .Names = c("EventDateTime", "FileCreationDateTime"),
                     row.names = c(NA, -2L),
                     class = c("tbl_df", "tbl", "data.frame"))

working %>%
  group_by(FileCreationDateTime) %>% 
  summarise(start = min(EventDateTime),
            end = max(EventDateTime),
            duration = dseconds(end - start))

broken  <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004311.972),
                                                    class = c("POSIXct", "POSIXt"),
                                                    tzone = "UTC"),
                          FileCreationDateTime = structure(c(1480000606, 1480000606),
                                                           class = c("POSIXct", "POSIXt"),
                                                           tzone = "Europe/Vienna")),
                     .Names = c("EventDateTime", "FileCreationDateTime"),
                     row.names = c(NA, -2L),
                     class = c("tbl_df", "tbl", "data.frame"))

broken %>%
  group_by(FileCreationDateTime) %>% 
  summarise(start = min(EventDateTime),
            end = max(EventDateTime),
            duration = dseconds(end - start))

I am using R 3.4.0 64bit, lubridate_1.6.0 and dplyr_0.5.0 on Windows 10.

Thanks for any help!

Upvotes: 1

Views: 49

Answers (1)

AEF
AEF

Reputation: 5650

I finally found the problem. It has nothing todo with dplyr but with lubridate::dseconds. As already reported (e.g. this issue) it fails on non-integer inputs > 60. This was apparently also my problem.

Upvotes: 1

Related Questions