Reputation: 5650
I have a tibble representing log messages. It has (among others) two columns:
What I now want to do is to find the start time, the end time and the duration of each logfile (identified by FileCreationDateTime). I think (or thought) this can be done with the following code:
file_durations <-
logMessages%>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
The code itself seems to run without error, i can however neither print the result nor access it (at least not column "duration") as it returns the error
Error in sprintf("%ds (~%s %ss)", x, x2, unit, "s)") :
invalid format '%d'; use format %f, %e, %g or %a for numeric objects
Investigating, I found that the error seems to depend on the exact values of the datetimes. I have put together a MWE with two tibbles. The two tibbles differ only in one value. One works, while the other doesn't. I have no idea what could cause the error. Can someone enlighten me?
The human readable tibbles:
> working
# A tibble: 2 × 2
EventDateTime FileCreationDateTime
<dttm> <dttm>
1 2016-11-24 16:16:44.986 2016-11-24 16:16:46
2 2016-11-24 16:17:43.282 2016-11-24 16:16:46
> broken
# A tibble: 2 × 2
EventDateTime FileCreationDateTime
<dttm> <dttm>
1 2016-11-24 16:16:44.986 2016-11-24 16:16:46
2 2016-11-24 16:18:31.971 2016-11-24 16:16:46
The complete MWE:
library(tidyverse)
library(lubridate)
options(digits.secs = 6, digits = 6)
working <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004263.283),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
FileCreationDateTime = structure(c(1480000606, 1480000606),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/Vienna")),
.Names = c("EventDateTime", "FileCreationDateTime"),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))
working %>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
broken <- structure(list(EventDateTime = structure(c(1480004204.987, 1480004311.972),
class = c("POSIXct", "POSIXt"),
tzone = "UTC"),
FileCreationDateTime = structure(c(1480000606, 1480000606),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/Vienna")),
.Names = c("EventDateTime", "FileCreationDateTime"),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))
broken %>%
group_by(FileCreationDateTime) %>%
summarise(start = min(EventDateTime),
end = max(EventDateTime),
duration = dseconds(end - start))
I am using R 3.4.0 64bit, lubridate_1.6.0 and dplyr_0.5.0 on Windows 10.
Thanks for any help!
Upvotes: 1
Views: 49
Reputation: 5650
I finally found the problem. It has nothing todo with dplyr
but with lubridate::dseconds
. As already reported (e.g. this issue) it fails on non-integer inputs > 60. This was apparently also my problem.
Upvotes: 1