Reputation: 1378
I am looking for a way to summarize data within a ggplot
call, not before. I could pre-aggregate the data and then plot it, but I know there is a way to do it within a ggplot
call. I'm just unsure how.
In this example, I want to get a mean for each (x,y) combo, and map it onto the colour
aes
library(tidyverse)
df <- tibble(x = rep(c(1,2,4,1,5),10),
y = rep(c(1,2,3,1,5),10),
col = sample(c(1:100), 50))
df_summar <- df %>%
group_by(x,y) %>%
summarise(col_mean = mean(col))
ggplot(df_summar, aes(x=x, y=y, col=col_mean)) +
geom_point(size = 5)
I think there must be a better way to avoid the pre-ggplot step (yes, I could also have piped dplyr
transformations into the ggplot
, but the mechanics would be the same).
For instance, geom_count()
counts the instances and plots them onto size
aes
:
ggplot(df, aes(x=x, y=y)) + geom_count()
I want the same, but mean
instead of count
, and col
instead of size
I'm guessing I need stat_summary()
or a stat()
call (a replacement for ..xxx.. notation), but I can't get it to give me what I need.
Upvotes: 2
Views: 1302
Reputation: 35387
You'll need stat_summary_2d
:
ggplot(df, aes(x, y, z = col)) +
stat_summary_2d(aes(col = ..value..), fun = 'mean', geom = 'point', size = 5)
(Or calc(value)
, if you use the ggplot dev version, or read this in the future.)
You can pass any arbitrary function to fun
.
While stat_summary
seems like it would be useful, it is not in this case. It is specialized in the common transformation for plotting, summarizing a range of y
values, grouped by x, into a set of summary statistics that are plotted as y
(, ymin
and ymax
). You want to group by both x and y, so 2d it is.
Note that this uses binning however, so to get the points to accurately line up, you need to increase bin size (e.g. to 1e3
). Unfortunately, there is no non-binning 2d summary stat.
Upvotes: 3