temp
temp

Reputation: 82

plot multiple groupwise means in R

my data looks like the following. I need to create some lineplot/barplot for average val for each group like, status and category in the csv file.
Data in dput format.

df <-
structure(list(val = c(4608, 4137, 6507, 5124, 
3608, 34377, 5507, 5624, 4608, 4137, 6507, 5124, 
3608, 3437, 5507, 5507, 5624), status = c("1x", 
"1x", "1x", "2x", "2x", "2x", "2x", "2x", "50xy", 
"50xy", "50xy", "60xy", "60xy", "70xy", "xyz", 
"xyz", "xyz"), category = c("A", "C", "A", "A", 
"A", "B", "B", "C", "B", "C", "A", "B", "C", 
"B", "B", "C", "C")), row.names = c(NA, 
-17L), class = "data.frame")

I tried the following code but could not figure out the whole thing.

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, color = source)) + 
      geom_smooth(method = "loess")

Help to plot them (each group wise, such as plotting mean val for each 2x and B) in a single window would be really appreciated. Thank you.

Upvotes: 1

Views: 1193

Answers (2)

Uwe
Uwe

Reputation: 42544

This question already has an accepted answer which requires to compute the aggregated mean(val) by status, category group beforehand.

However, ggplot2 includes transformations (or stats) which enable us to create the desired plot in one go without utilizing other packages:

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, colour = category)) +
  stat_summary(geom = "line", fun.y = "mean")

This creates a line plot of the mean values as requested by the OP:

enter image description here

Alternatively, we can tell geom_line to use a summary statistics:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_line(stat = "summary", fun.y = "mean")

which creates the same plot.

stat_summary() can also be used to show the original data and the summary statistics combined in one plot:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_point() +
  stat_summary(geom = "line", fun.y = "mean")

enter image description here

This can help to better understand the structure of the underlying data, e.g., outliers. Please, note the different y scale.

Upvotes: 2

YOLO
YOLO

Reputation: 21709

You can do:

library(dplyr)
library(ggplot2)
df %>%
    group_by(category, status) %>%
    mutate(agg = mean(val)) %>%
    ggplot(., aes(status, agg, fill = category, color=status))+
    geom_col(position = "dodge")

Upvotes: 2

Related Questions