anrisakaki96
anrisakaki96

Reputation: 313

How do I add a line of best fit for scatterplot with multiple variables in ggplot

My data looks like this:

Male_0602 <- rnorm(N)
Female_0602 <- rnorm(N)
tce <- rnorm(N)

data <- data.frame(Male_0602, Female_0602, tce)

Currently, my code looks like this:

ggplot(data, aes(x = tce*100)) + 
  geom_point(aes(y = Male_0602, colour = "Male")) +
  geom_point(aes(y = Female_0602, colour = "Female")) +
  labs(x = "Province-level tariff cut",
       y = "Change in province-level wage between 2001 and 2005")

My scatterplot currently looks like this:

enter image description here

I would like to add a separate line of best fit for Male and Female. Can you please tell me how to do that? Thank you.

Upvotes: 0

Views: 75

Answers (2)

JaredS
JaredS

Reputation: 240

The following code converts the data frame into a longer format so that the variables can be plotted on one geom, them use geom_smooth to add the best lit lines.

   data %>% 
      pivot_longer(cols = c(Male_0602, Female_0602), names_to = "cats") %>% 
      ggplot(aes(x = tce*100, y = value, colour = as_factor(cats)))+
      geom_point()+
      geom_smooth(method = "lm", se = FALSE)

enter image description here

Upvotes: 2

Mohanasundaram
Mohanasundaram

Reputation: 2949

Convert the data frame into long format and add geom_smooth().

library(tidyr)

data %>% 
  pivot_longer(cols = c(Male_0602, Female_0602), names_to = "gender") %>% 
  ggplot(aes(x = tce*100, y = value, color = gender)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Province-level tariff cut",
       y = "Change in province-level wage between 2001 and 2005")

enter image description here

Upvotes: 2

Related Questions