Graph mean and standard deviation

Question

The source of this data is server performance metrics. The numbers I have are the mean (os_cpu) and standard deviation (os_cpu_sd). Mean clearly doesn't tell the whole story, so I want to add standard deviation. I started down the path of geom_errorbar, however I believe this is for standard error. What would be an accepted way to plot these metrics? Below is a reproducible example:

DF_CPU <- structure(list(end = structure(c(1387315140, 1387316340, 1387317540, 
   1387318740, 1387319940, 1387321140, 1387322340, 1387323540, 1387324740, 
   1387325940, 1387327140, 1387328340, 1387329540, 1387330740, 1387331940, 
   1387333140, 1387334340, 1387335540, 1387336740, 1387337940, 1387339140, 
   1387340340, 1387341540, 1387342740, 1387343940, 1387345140, 1387346340, 
   1387347540, 1387348740, 1387349940), class = c("POSIXct", "POSIXt"
   ), tzone = "UTC"), os_cpu = c(14.8, 15.5, 17.4, 15.6, 14.9, 14.6, 
     15, 15.2, 14.6, 15.2, 15, 14.5, 14.8, 15, 14.6, 14.9, 14.9, 14.4, 
     14.8, 14.9, 14.5, 15, 14.6, 14.5, 15.3, 14.6, 14.6, 15.2, 14.5, 
     14.5), os_cpu_sd = c(1.3, 2.1, 3.2, 3.3, 0.9, 0.4, 1.4, 1.5, 
        0.4, 1.6, 1, 0.4, 1.4, 1.4, 0.4, 1.3, 0.9, 0.4, 1.4, 1.3, 0.4, 
        1.7, 0.4, 0.4, 1.7, 0.4, 0.4, 1.7, 0.5, 0.4)), .Names = c("end", 
            "os_cpu", "os_cpu_sd"), class = "data.frame", row.names = c(1L, 
                5L, 9L, 13L, 17L, 21L, 25L, 29L, 33L, 37L, 41L, 45L, 49L, 53L, 
                57L, 61L, 65L, 69L, 73L, 77L, 81L, 85L, 89L, 93L, 97L, 101L, 
                  105L, 109L, 113L, 117L))

head(DF_CPU)
                   end os_cpu os_cpu_sd
1  2013-12-17 21:19:00   14.8       1.3
5  2013-12-17 21:39:00   15.5       2.1
9  2013-12-17 21:59:00   17.4       3.2
13 2013-12-17 22:19:00   15.6       3.3
17 2013-12-17 22:39:00   14.9       0.9

ggplot(data=DF_CPU, aes(x=end, y=os_cpu)) +
  geom_line()+
  geom_errorbar(aes(ymin=os_cpu-os_cpu_sd,ymax=os_cpu+os_cpu_sd), alpha=0.2,color="red")

enter image description here

Per @ari-b-friedman suggestion, here's what it looks like with geom_ribbon(): enter image description here

jlhoward · Accepted Answer

Your question is largely about aesthetics, and so opinions will differ. Having said that there are some guidelines:

Emphasize what is important.
Provide a frame of reference if at all possible.
Avoid misleading scales or graphics.
Avoid unnecessary graphics.

So this code:

ggplot(data=DF_CPU, aes(x=end, y=os_cpu)) +
  geom_point(size=3, shape=1)+
  geom_line(linetype=2, colour="grey")+
  geom_linerange(aes(ymin=os_cpu-1.96*os_cpu_sd,ymax=os_cpu+1.96*os_cpu_sd), alpha=0.5,color="blue")+
  ylim(0,max(DF_CPU$os_cpu+1.96*DF_CPU$os_cpu_sd))+
  stat_smooth(formula=y~1,se=TRUE,method="lm",linetype=2,size=1)+
  theme_bw()

Produces this:

This graphic emphasizes that cpu utilization (??) over 20 min intervals did not deviate significantly from the average for the 9 hour period monitored. The reference line is average utilization. The error bars were replaced with geom_linerange(...) because the horizontal bars in geom_errorbar(...) add nothing and are distracting. Also, your original plot makes it seem that error is very large compared to actual utilization, which it isn't. I changed the range to +/- 1.96*sd because that more closely approximates 95% CL. Finally, the x- and y-axis labels need to be replaced with something descriptive, but I don't have enough information to do that.

Graph mean and standard deviation

Answers (2)

Related Questions