PatrickT
PatrickT

Reputation: 10510

R: ggplot with durations

Question: Your advice for handling durations with ggplot2 (author: Hadley Wickham). Specifically: reproduce the plots below with custom breaks and suitable labels. Preference for minimal use of custom functions and/or refactoring of data. Suggestions with packages I have not cited are welcome.

The data is stored in seconds (see df below). I would like to display human-eye readable breaks and labels, e.g. days instead of thousands of seconds, where the breaks occurs at 0, 1, 2... days instead of awkward fractions.

Proof of effort: The first example below deals with durations as integers and achieves the objective by appropriate case-by-case division by multiples of 60/24/365, etc. The second example uses the base R difftime objects. To get it right in this case, I had to use the strptime function and subtract 1. Have I missed something? The third example uses the duration class from the lubridate package. While specifying labels was quite easy with the day() and seconds_to_period() functions, I didn't do such a good job at setting custom breaks. The fourth example uses the hms class. I managed to specify breaks, but not the labels. Any suggestions on how to write shorter lines of code for each of the examples below are also welcome.

# Data
df = data.frame(x = 1:6, 
    num = c(374400, 343500, 174000, 193500, 197700, 270300))

# base/difftime
df$difftime <- as.difftime(df$num, units = "secs")

# lubridate/duration
library("lubridate")  # devtools::install_github("tidyverse/lubridate") # the dev version fixes a bug
df$duration <- duration(df$num, units = "seconds")

# hms/hms
library("hms")
df$hms <- as.hms(df$num) 

library("ggplot2")
library("scales")

# 1: data is base/numeric
# Pro: no package dependence
# Con: Hard work 
breaks = seq(0, 100*60*60, 20*60*60)
labels = function(x) round(x/60/60/24, 0)
ggplot(data = df, aes(x = x, y = num)) +
    geom_bar(stat = "identity", fill = "lightblue") +
    scale_y_continuous(name = "Duration (Days)", 
                       breaks = breaks,
                       labels = labels) +
    labs(title = "Data stored as numeric (seconds)", 
         subtitle = "breaks = seq(0, 100*60*60, 20*60*60)\nlabels = function(x) round(x/60/60/24, 0)",
         x = NULL) 
ggsave("base-num.png")

enter image description here

# 2: data is base/difftime
# Pro: simple once you get over the ``strftime(x, "%d")`` syntax.
# Unresolved: Why do I need to subtract a day?
labels = function(x) as.integer(strftime(x, "%d"))-1
ggplot(data = df, aes(x = x, y = difftime)) +
    geom_bar(stat = "identity", fill = "lightblue") +
    scale_y_time(name = "Duration (Days)", 
        labels = labels) +
    labs(title = "Data stored as difftime (seconds)", 
         subtitle = "default breaks\nlabels = function(x) as.integer(strftime(x, '%d'))-1",
         x = NULL) 
ggsave("base-difftime.png")

enter image description here

# 3: data is lubridate/duration
# Pro: intuitive combination of day() and seconds_to_period() functions
# Unresolved: a better way to make own breaks?
breaks = as.duration(seq(0, 5, 1)*60*60*24)
labels = function(x) day(seconds_to_period(x))
ggplot(data = df, aes(x = x, y = duration)) +
    geom_bar(stat = "identity", fill = "lightblue") +
    scale_y_continuous(name = "Duration (Days)", 
        breaks = breaks,
        labels = labels) +
    labs(title = "Data stored as duration (seconds)", 
         subtitle = "breaks = as.duration(seq(0, 5, 1)*60*60*24)\nlabels = function(x)lubridate::day(lubridate::seconds_to_period(x))",
         x = NULL) 
ggsave("lubridate-duration.png")

enter image description here

# 4: data is hms/hms
# Pro: Immediately generates plot with acceptable labels
# Unresolved: how to make own labels:  Failed attempts:
labels = 0:(length(breaks)-1)
labels = function(x)lubridate::day(x)

breaks = seq(0, 5, 1)*60*60*24
ggplot(data = df, aes(x = x, y = hms)) +
    geom_bar(stat = "identity", fill = "lightblue") +
    scale_y_continuous(name = "Duration (Seconds)",
        breaks = breaks) +
    labs(title = "Data stored as hms (seconds)", 
         subtitle = "breaks = seq(0, 5, 1)*60*60*24\ndefault labels",
         x = NULL) 
ggsave("hms-hms.png")

enter image description here

EDIT Following Axeman's suggestion in the comments section, this is how to combine ggplot with hms objects. This looks to me like the most convenient of the 4, though admittedly having to subtract 1 is unexpected. Axeman, do you want to post this as an answer?

breaks = hms::hms(days = 0:4)
labels = function(x) lubridate::day(x)-1

enter image description here

Upvotes: 2

Views: 1744

Answers (1)

Uwe
Uwe

Reputation: 42544

IMHO, the proposed solutions look overly complicated to me.

If durations are given as integer seconds and need to be plotted on a day scale, my approach is to scale it in the call to aes():

df = data.frame(x = 1:6, 
                num = c(374400, 343500, 174000, 193500, 197700, 270300))
library("ggplot2")
ggplot(data = df, aes(x = x, y = num / (24*60*60))) +
  geom_col(fill = "lightblue") +
  labs(title = "Data stored as numeric (seconds)",
       y = "Duration (Days)",
       x = NULL) 

enter image description here

So, no need to fiddle about breaks and labels.

N.B.: geom_col() is a replacement for geom_bar(stat = "identity").

Upvotes: 2

Related Questions