Reputation: 3190
I am trying to visualize the offset of a time series from its baseline value using bar charts from R's gglot2 package. For instance, take the following synthetic data:
baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)
print(value)
[1] 400 400 397 397 393 400 394 395 389 389 385 395 400 399 405 403 399 401 399 401
[21] 401 401 398 397 395 395 401 402 393 400 399 398 406 412 417 413 410 401 400 399
[41] 394 401 406 406 401 404 411 413 404 402
I can draw the chart in its original scale, but this is not really informative:
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom
By shifting the aesthetic along the y axis (ie, y = value - baseline) I obtain the chart I want to show, which is nice and easy.
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom
Unfortunately, the scale of the y axis is now changed to the offset from the baseline, ie to "value - baseline". However, I do want to have the y axis keep the original values (the ones between 380 and 420).
Is there any way to preserve the original y axis scale for the second chart? Do you have any other advise on visualizing the difference from a target value?
Upvotes: 3
Views: 1197
Reputation: 5089
Another solution instead of post-hoc changing the y axis is to use geom_linerange
and then just make the lines wide enough for the particular plot (the different geoms for cross or error bars may be suitable as well).
p <- ggplot(data=longdata, aes(x = time, color = factor(posneg))) +
geom_linerange(aes(ymax = value, ymin = baseline), size = 3) +
scale_color_brewer( palette='Set1', guide=FALSE )
p
This is a totally reasonable plot to make, as you show that you miss all of the variation by zooming out and using the baseline at zero rule. But, bar charts have such strong conventions that the baseline should be at zero it can potentially be misread. Also the bars at the baseline value do not appear in the plot at all, which makes them look like missing data.
A line plot with a horizontal bar symbolizing the baseline is sufficient to show the same information, no color is really needed.
p2 <- ggplot(data=longdata, aes(x = time, y = value)) +
geom_line() + geom_point() + geom_hline(yintercept=baseline)
p2
Upvotes: 4
Reputation: 1299
Add a function:
yaxis_format <- function(x){
lab <- 400-x
}
and then use scale_y_continuous(label = yaxis_format)
to process the label:
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)
The final code and graph should look like this:
library(ggplot2)
library(plyr)
set.seed(201)
baseline = 400
steps <- sample(0:10,50,replace=TRUE) - sample(0:10,50,replace=TRUE)
value <- cumsum(steps) + baseline
time = 1:50
data <- data.frame(time,value)
yaxis_format <- function(x){
lab <- 400-x
}
longdata <- ddply( data, "value", transform, posneg=sign(value-baseline) )
longdata[longdata$posneg == 0,'posneg'] <- 1
p_aes <- aes( time, value-baseline, fill=factor(posneg))
p_scale <- scale_fill_brewer( palette='Set1', guide=FALSE )
p_geom <- geom_bar( stat='identity', position='identity' )
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_continuous(label=yaxis_format)
+ ylab("Value")
Now, with that all set, notice that the scale is odd. Use scale_y_reverse
instead to fix it:
ggplot(longdata) + p_aes + p_scale + p_geom + scale_y_reverse(label=yaxis_format)
+ ylab("Value")
Upvotes: 5