Reputation: 10510
Question: Your advice for handling durations with ggplot2
(author: Hadley Wickham). Specifically: reproduce the plots below with custom breaks and suitable labels. Preference for minimal use of custom functions and/or refactoring of data. Suggestions with packages I have not cited are welcome.
The data is stored in seconds (see df
below). I would like to display human-eye readable breaks and labels, e.g. days instead of thousands of seconds, where the breaks occurs at 0, 1, 2... days instead of awkward fractions.
Proof of effort: The first example below deals with durations as integers and achieves the objective by appropriate case-by-case division by multiples of 60/24/365, etc. The second example uses the base R
difftime
objects. To get it right in this case, I had to use the strptime
function and subtract 1
. Have I missed something? The third example uses the duration
class from the lubridate
package. While specifying labels was quite easy with the day()
and seconds_to_period()
functions, I didn't do such a good job at setting custom breaks. The fourth example uses the hms
class. I managed to specify breaks, but not the labels. Any suggestions on how to write shorter lines of code for each of the examples below are also welcome.
# Data
df = data.frame(x = 1:6,
num = c(374400, 343500, 174000, 193500, 197700, 270300))
# base/difftime
df$difftime <- as.difftime(df$num, units = "secs")
# lubridate/duration
library("lubridate") # devtools::install_github("tidyverse/lubridate") # the dev version fixes a bug
df$duration <- duration(df$num, units = "seconds")
# hms/hms
library("hms")
df$hms <- as.hms(df$num)
library("ggplot2")
library("scales")
# 1: data is base/numeric
# Pro: no package dependence
# Con: Hard work
breaks = seq(0, 100*60*60, 20*60*60)
labels = function(x) round(x/60/60/24, 0)
ggplot(data = df, aes(x = x, y = num)) +
geom_bar(stat = "identity", fill = "lightblue") +
scale_y_continuous(name = "Duration (Days)",
breaks = breaks,
labels = labels) +
labs(title = "Data stored as numeric (seconds)",
subtitle = "breaks = seq(0, 100*60*60, 20*60*60)\nlabels = function(x) round(x/60/60/24, 0)",
x = NULL)
ggsave("base-num.png")
# 2: data is base/difftime
# Pro: simple once you get over the ``strftime(x, "%d")`` syntax.
# Unresolved: Why do I need to subtract a day?
labels = function(x) as.integer(strftime(x, "%d"))-1
ggplot(data = df, aes(x = x, y = difftime)) +
geom_bar(stat = "identity", fill = "lightblue") +
scale_y_time(name = "Duration (Days)",
labels = labels) +
labs(title = "Data stored as difftime (seconds)",
subtitle = "default breaks\nlabels = function(x) as.integer(strftime(x, '%d'))-1",
x = NULL)
ggsave("base-difftime.png")
# 3: data is lubridate/duration
# Pro: intuitive combination of day() and seconds_to_period() functions
# Unresolved: a better way to make own breaks?
breaks = as.duration(seq(0, 5, 1)*60*60*24)
labels = function(x) day(seconds_to_period(x))
ggplot(data = df, aes(x = x, y = duration)) +
geom_bar(stat = "identity", fill = "lightblue") +
scale_y_continuous(name = "Duration (Days)",
breaks = breaks,
labels = labels) +
labs(title = "Data stored as duration (seconds)",
subtitle = "breaks = as.duration(seq(0, 5, 1)*60*60*24)\nlabels = function(x)lubridate::day(lubridate::seconds_to_period(x))",
x = NULL)
ggsave("lubridate-duration.png")
# 4: data is hms/hms
# Pro: Immediately generates plot with acceptable labels
# Unresolved: how to make own labels: Failed attempts:
labels = 0:(length(breaks)-1)
labels = function(x)lubridate::day(x)
breaks = seq(0, 5, 1)*60*60*24
ggplot(data = df, aes(x = x, y = hms)) +
geom_bar(stat = "identity", fill = "lightblue") +
scale_y_continuous(name = "Duration (Seconds)",
breaks = breaks) +
labs(title = "Data stored as hms (seconds)",
subtitle = "breaks = seq(0, 5, 1)*60*60*24\ndefault labels",
x = NULL)
ggsave("hms-hms.png")
EDIT Following Axeman's suggestion in the comments section, this is how to combine ggplot
with hms
objects. This looks to me like the most convenient of the 4, though admittedly having to subtract 1
is unexpected. Axeman, do you want to post this as an answer?
breaks = hms::hms(days = 0:4)
labels = function(x) lubridate::day(x)-1
Upvotes: 2
Views: 1744
Reputation: 42544
IMHO, the proposed solutions look overly complicated to me.
If durations are given as integer seconds and need to be plotted on a day scale, my approach is to scale it in the call to aes()
:
df = data.frame(x = 1:6,
num = c(374400, 343500, 174000, 193500, 197700, 270300))
library("ggplot2")
ggplot(data = df, aes(x = x, y = num / (24*60*60))) +
geom_col(fill = "lightblue") +
labs(title = "Data stored as numeric (seconds)",
y = "Duration (Days)",
x = NULL)
So, no need to fiddle about breaks and labels.
N.B.: geom_col()
is a replacement for geom_bar(stat = "identity")
.
Upvotes: 2