Marco Pastor Mayo
Marco Pastor Mayo

Reputation: 853

Different color scale for geom_point and geom_smooth on ggplot

I am trying to plot observations and their grouped regression lines with ggplot as follows:

ggplot(df, aes(x = cabpol.e, y = pred.vote_share, color = coalshare)) +
  geom_point() +
  scale_color_gradient2(midpoint = 50, low="blue", mid="green", high="red") +
  geom_smooth(aes(x = cabpol.e, y = pred.vote_share, group=coalshare1, fill = coalshare1), se = FALSE, method='lm') +
  scale_fill_manual(values = c(Junior="blue", Medium="green", Senior="red"))

enter image description here The problem is that the lines from geom_smooth are all the same color. I tried using scale_fill_manual so that there aren't two different color scales, and manually determining which color corresponds to each group. but instead all the lines appear blue. How can I make each line a different color?

As requested, here is a set of replicable data with the same problem:

set.seed(1000)
dff <- data.frame(x=rnorm(100, 0, 1),
                  y=rnorm(100, 1, 2),
                  z=seq(1, 100, 1),
                  g=rep(c("A", "B"), 50))
ggplot(dff, aes(x = x, y = y, color = z, group = g, fill = g)) +
  geom_point() +
  scale_color_gradient2(midpoint = 50, low="blue", high="red") +
  geom_smooth(se = FALSE, method='lm')

enter image description here

Upvotes: 2

Views: 3421

Answers (2)

Marco Pastor Mayo
Marco Pastor Mayo

Reputation: 853

Group-trends between the x and y variables can be plotted by using different dataframes for the geom_line (with predicted values) and geom_point (with raw data) functions. Make sure to determine in the ggplot() function that color is always the same variable, and then for geom_line group by the same variable.

p2 <- ggplot(NULL, aes(x = cabpol.e, y = vote_share, color = coalshare)) +
  geom_line(data = preds, aes(group = coalshare, color = coalshare), size = 1) +
  geom_point(data = df, aes(x = cabpol.e, y = vote_share)) +
  scale_color_gradient2(name = "Share of Seats\nin Coalition (%)",
      midpoint = 50, low="blue", mid = "green", high="red") +
  xlab("Ideological Differences on State/Market") +
  ylab("Vote Share (%)") +
  ggtitle("Vote Share Won by Coalition Parties in Next Election")

enter image description here

Upvotes: 1

Henry Cyranka
Henry Cyranka

Reputation: 3060

My solution to this problem would be to create multiple geom_smooth calls, and each time subset the data for the desired factor level. This way you are able to pass a different color to each call of geom_smooth. As long as you do not have many factors, this solution is not terribly inefficient.

dff <- data.frame(x=rnorm(100, 0, 1),
                  y=rnorm(100, 1, 2),
                  z=seq(1, 100, 1),
                  g=rep(c("A", "B"), 50))

ggplot(dff, aes(x = x, y = y,
                color = z,
                group = g)) +
    geom_point() +
    scale_color_gradient2(midpoint = 50, low="blue", high="red") + 
    geom_smooth(
     aes(x = x, y =y),
     color = "red",
     method = "lm",
     data = filter(dff, g == "A"),
     se = FALSE
    )  + 
    geom_smooth(
        aes(x = x, y =y),
        color = "blue",
        method = "lm",
        data = filter(dff, g == "B"),
        se = FALSE
    )

enter image description here

Upvotes: 2

Related Questions