wasmetqall
wasmetqall

Reputation: 139

How to plot mean of observations grouped by more than one column using ggplot?

Say I have a dataframe:

date ID times value
1   B048669    1    41
2   B048669    1    29
3   B048669    1    37
4   B048669    1    31
5   B048669    1    NA
6   B048669    1    23
1  Y2929021    1    43
2  Y2929021    1    10
3  Y2929021    1    NA
4  Y2929021    1    NA
5  Y2929021    1    29
6  Y2929021    1    NA
1  Y2929021    2    43
2  Y2929021    2    NA
3  Y2929021    2    15
4  Y2929021    2    3
5  Y2929021    2    29
6  Y2929021    2    NA

I want to calculate the mean of each day grouped by ID&times.Then x = date, y = value. So the first point is x=1,y=(41+43+43)/3, the second point is x=2,y=(29+10+NA)/2

Upvotes: 0

Views: 1248

Answers (2)

Jacqueline Nolis
Jacqueline Nolis

Reputation: 1547

You should use the package dplyr to aggregate the data and ggplot2 to plot it (from the tidyverse). It's worth reading up on the tidyverse in general because it has incredibly powerful and easy to use packages. Assuming your data is in a dataframe df:

require(dplyr)
require(ggplot2)

aggregated_df <-
  df %>%
  group_by(date) %>%
  summarize(value = mean(value,na.rm=TRUE))

ggplot(aggregated_df, aes(x = date, y = value)) + geom_col()

The default plot is not particularly attractive, but you can modify the style to your heart's content:

Example plot

Upvotes: 2

Jordi
Jordi

Reputation: 1343

You may want to calculate the mean before calling ggplot. Using dplyr:

df <- df %>%
    group_by(ID, times) %>% 
    summarize(mean = mean(value))

Then call ggplot plot with mapping aes(y = median) and whatever aesthetics you want to map ID and times to.

Upvotes: 1

Related Questions