rnorouzian
rnorouzian

Reputation: 7517

adding group mean to the geom_point() in ggplot

I was wondering if I could add (overlay) the mean of each column of points (using different shapes, e.g., rectangles, squares, triangles etc.) to the plot below?

library(ggplot2)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
ten <- subset(hsb, sch.id %in% unique(sch.id)[1:10]) # get 10 schools for display
ten %>% ggplot() + aes(meanses, math) + geom_point() + geom_smooth(method = "lm", se = FALSE)

enter image description here

Upvotes: 2

Views: 269

Answers (1)

r2evans
r2evans

Reputation: 160447

I'll demonstrate a starting method, but it's flawed in that I infer that rounding meanses is sufficient to keep things grouped correctly. If it is not always this "clean", then you will likely need some cheap clustering in there as well.

means <- group_by(ten, g = format(round(meanses, 1))) %>%
  summarize(meanses = first(meanses), math = mean(math))
ggplot(ten, aes(meanses, math)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  geom_point(data = means, aes(shape = g), color = "red", size = 5)
# `geom_smooth()` using formula 'y ~ x'
# Warning: The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes
# difficult to discriminate; you have 7. Consider specifying shapes manually if you must
# have them.
# Warning: Removed 1 rows containing missing values (geom_point).

(We also need a few more shapes ... scale_shape_manual is likely required.)

ggplot2 with group averages


A quick way to find the groups here:

grps <- sort(unique(round(ten$meanses, 3)))
grps[c(FALSE, diff(grps) < 0.01)] <- NA
grps <- grps[!is.na(grps)]
means <- ten %>%
  group_by(grp = format(grps[ apply(abs(outer(meanses, grps, `-`)), 1, which.min) ])) %>%
  summarize(meanses = first(meanses), math = mean(math)) %>%
  ungroup()

(It still has the same problem with the number of shapes, but that's easily worked around.)

Upvotes: 1

Related Questions