shaun
shaun

Reputation: 570

Boxplot of CSV data with ggplot2

I have a CSV file of weights taken everyday for six months (August 2016 - January 2017) for every day. I would like to plot a boxplot for each month that basically plots the summary() of the data for each month. I would like to use ggplot2 for it, since it looks much prettier. I've fished around for a solution and come up with many but nothing that seems to solve what I want.

The head and summary of the data:

> wts <- read.csv('weights.csv', header=T, sep=',')
> head(wts)
  August.2016 September.2016 October.2016 November.2016 December.2016 January.2016
1       254.2          250.0        248.2         245.8         245.6        244.4
2       252.6          249.2        248.6         246.4         246.0        245.0
3       251.8          250.6        249.2         248.0         246.4        244.3
4       253.2          252.4        249.8         247.5         246.0        243.6
5       252.2          250.6        248.8         247.0         246.0        242.6
6       254.0          251.0        247.8         247.6         246.0        242.0
> summary(wts)
  August.2016    September.2016   October.2016   November.2016   December.2016    January.2016  
 Min.   :249.6   Min.   :245.6   Min.   :245.4   Min.   :244.2   Min.   :243.4   Min.   :241.6  
 1st Qu.:252.2   1st Qu.:248.3   1st Qu.:246.7   1st Qu.:246.2   1st Qu.:244.8   1st Qu.:242.9  
 Median :252.8   Median :249.2   Median :247.8   Median :246.6   Median :245.6   Median :243.6  
 Mean   :252.7   Mean   :249.1   Mean   :247.6   Mean   :246.7   Mean   :245.3   Mean   :243.5  
 3rd Qu.:253.6   3rd Qu.:250.0   3rd Qu.:248.2   3rd Qu.:247.2   3rd Qu.:246.0   3rd Qu.:244.3  
 Max.   :255.2   Max.   :252.4   Max.   :249.8   Max.   :248.6   Max.   :247.0   Max.   :245.0  
                 NA's   :1                       NA's   :1                       NA's   :1  

From what I've gathered I need to reshape the data in way that ggplot likes, but I'm not sure how to do it. I would also, like highlight the mean (with the actual number) on the boxplot if it is possible. Could I get an idea on how to do it?

Thanks

Upvotes: 0

Views: 1442

Answers (1)

mtoto
mtoto

Reputation: 24198

To stay in the same paradigm, you can use gather() from tidyr package to reshape your data into a long format, and plug the result into ggplot(). To add text depicting the mean, you can use stat_summary() with the "text" geom and the mean function applied to the value variable.

library(tidyr)
library(ggplot2)

ggplot(gather(wts, factor_key = TRUE), 
   aes(key, value)) + 
    geom_boxplot() + 
    stat_summary(aes(label = ..y..), 
                 fun.y = function(x) round(mean(x), 2), 
                 geom = "text", 
                 size = 3,
                 color = "red")

enter image description here

Upvotes: 2

Related Questions