itsMeInMiami
itsMeInMiami

Reputation: 2669

In ggplot how do I plot the mean line for two groups in a scatterplot

I would like to show the mean of two groups in a scatterplot. I have sorted the data so the groups are next to each other. Group 1 is the first 11 records and group2 is the next 133. How can I tell ggplot to draw one line across the range for the first group (House 1-11) and a second line for the second (House 12-133).

Here is what I have so far:

enter image description here

And the code is here:

library(tidyverse)
library(tidymodels)

data(ames)
ames <- AmesHousing::make_ames()

set.seed(1)
split  <- initial_split(ames, prop = 0.95, strata = "Sale_Price")
ames_plot   <- testing(split)

model1 <- lm(Sale_Price ~ Central_Air, data = ames_plot)

p1 <- model1 %>%
  broom::augment() %>%
  arrange(Central_Air) %>% 
  mutate(House = row_number()) %>% 
  ggplot(aes(House, Sale_Price, color = Central_Air)) + 
  geom_point(size = 1, alpha = 0.3) +
  geom_segment(aes(x = 1, y = .fitted, xend = 144, yend =.fitted)) +
  scale_y_continuous(labels = scales::dollar) 
p1

Using geom_smooth(formula = 'y ~ x', se = FALSE, method = "lm") instead of geom_segment() gets me close to what I want but I want to show the actual predicted values coming form the lm().

Upvotes: 0

Views: 385

Answers (1)

MrFlick
MrFlick

Reputation: 206167

It would be best just to summarize your data for that layer. For example

model1 %>%
  broom::augment() %>%
  arrange(Central_Air) %>% 
  mutate(House = row_number()) %>% 
  ggplot(aes(House, Sale_Price, color = Central_Air)) + 
  geom_point(size = 1, alpha=.3) +
  geom_segment(aes(x = first, y = .fitted, xend = last, yend =.fitted), 
    data = function(x) {
      x %>% 
        group_by(Central_Air)  %>% 
        summarize(first=first(House), last=last(House), .fitted=mean(.fitted), .groups="drop_last")
  }) + 
  scale_y_continuous(labels = scales::dollar) 

Upvotes: 3

Related Questions