Averaging rows by multiple column variables in R

Question

I am trying to create a multiple bar chart of my data, depicting the mean of avgct for each region with error bars using ggplot2.

Here is a sample of my data:

gregion lregion   avgct
1          e      1.146
1          e      0.947
2          e      0.908    
3          e      1.167
1          t      1.225   
2          t      1.058
2          t      2.436
3          t      0.679

So far I have managed to create this graph, but it seems to be plotting the maximum values for avgct not the mean and therefore I cannot create error bars.

enter image description here

I think I need to calculate the mean of avgct by gregion and lregion so that I have an average value of avgct for each region, like this:

gregion lregion   mean(avgct)   
1          e      1.047 
2          e      0.908 
3          e      1.167
1          t      1.225 
2          t      1.747
3          t      0.679

If anyone can help me with this so that I can plot a barchart of averages with error bars for my data it would be very much appreciated!

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

This is a basic aggregation question, so the typical starting point should be aggregate:

> aggregate(avgct ~ gregion + lregion, mydf, mean)
  gregion lregion  avgct
1       1       e 1.0465
2       2       e 0.9080
3       3       e 1.1670
4       1       t 1.2250
5       2       t 1.7470
6       3       t 0.6790

There are, however, several other alternatives, including "dplyr" and "data.table", that may be more appealing in the long run for convenience of syntax and overall efficiency.

library(data.table)
as.data.table(mydf)[, mean(avgct), by = .(gregion, lregion)]


library(dplyr)
mydf %>% group_by(gregion, lregion) %>% summarise(avgct = mean(avgct))

Averaging rows by multiple column variables in R

Answers (1)

Related Questions