Reputation: 141
Problem
I have some data points stored in data.frame
with three variables, x
, y
, and gender
. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.
What I currently do is to use a new set of x
's and predict y
's for every model, combine the fitted lines together in a data.frame
, and then convert wide to long, with their model name as the third var (from this post: ggplot2: how to add the legend for a line added to a scatter plot? and this: Add legend to ggplot2 line plot I learnt that mapping should be used instead of setting colours/legends separately). However, while I can get a multicolor line plot, the points come without specific colour for gender
(already a factor
) as I expected from the posts I referenced.
I also know it might be possible to use aes=(y=predict(model))
, but I met other problems for this. I also tried to colour the points directly in aes
, and assign colours separately for each line, but the legend cannot be generated unless I use lty
, which makes legend in the same colour.
Would appreciate any idea, and also welcome to change the whole method.
Code
Note that two pairs of lines overlap. So it only appeared to be two lines. I guess adding some jitter
in the data might make it look differently.
slrmen<-lm(tc~x+I(x^2),data=data[data['gender']==0,])
slrwomen<-lm(tc~x+I(x^2),data=data[data['gender']==1,])
prdf <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(1,100)))
prdm <- data.frame(x = seq(from = range(data$x)[1],
to = range(data$x)[2], length.out = 100),
gender = as.factor(rep(0,100)))
prdf$fit <- predict(fullmodel, newdata = prdf)
prdm$fit <- predict(fullmodel, newdata = prdm)
rawplotdata<-data.frame(x=prdf$x, fullf=prdf$fit, fullm=prdm$fit,
linf=predict(slrwomen, newdata = prdf),
linm=predict(slrmen, newdata = prdm))
plotdata<-reshape2::melt(rawplotdata,id.vars="x",
measure.vars=c("fullf","fullm","linf","linm"),
variable.name="fitmethod", value.name="y")
plotdata$fitmethod<-as.factor(plotdata$fitmethod)
plt <- ggplot() +
geom_line(data = plotdata, aes(x = x, y = y, group = fitmethod,
colour=fitmethod)) +
scale_colour_manual(name = "Fit Methods",
values = c("fullf" = "lightskyblue",
"linf" = "cornflowerblue",
"fullm"="darkseagreen", "linm" = "olivedrab")) +
geom_point(data = data, aes(x = x, y = y, fill = gender)) +
scale_fill_manual(values=c("blue","green")) ## This does not work as I expected...
show(plt)
Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:
ggplot(data = prdf, aes(x = x, y = fit)) + # prdf and prdm are just data frames containing the x's and fitted values for different models
geom_line(aes(lty="Female"),colour = "chocolate") +
geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue") +
geom_point(data = data, aes(x = x, y = y, colour = gender)) +
scale_colour_discrete(name="Gender", breaks=c(0,1),
labels=c("Male","Female"))
Upvotes: 1
Views: 5459
Reputation: 707
This is related to using the colour
aesthetic for lines and the fill
aesthetics for points in your own (first) example. In the second example, it works because the colour
aesthetic is used for lines and points.
By default, geom_point
can not map a variable to fill
, because the default point shape (19) doesn't have a fill.
For fill
to work on points, you have to specify shape = 21:25
in geom_point()
, outside of aes()
.
Perhaps this small reproducible example helps to illustrate the point:
Simulate data
set.seed(4821)
x1 <- rnorm(100, mean = 5)
set.seed(4821)
x2 <- rnorm(100, mean = 6)
data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
tc = c(x1, x2),
gender = factor(c(rep("Female", 100), rep("Male", 100))))
Fit models
slrmen <-lm(tc~x+I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x+I(x^2),data = data[data["gender"]=="Female",])
newdat <- data.frame(x = seq(20,80,length.out = 200))
fitted.male <- data.frame(x = newdat,
gender = "Male",
tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
gender = "Female",
tc = predict(object = slrwomen, newdata = newdat))
Plot using colour
aesthetics
Use the colour
aesthetics for both points and lines (specify in ggplot
such that it gets inherited throughout). By default, geom_point
can map a variable to colour
.
library(ggplot2)
ggplot(data, aes(x = x, y = tc, colour = gender)) +
geom_point() +
geom_line(data = fitted.male) +
geom_line(data = fitted.female) +
scale_colour_manual(values = c("tomato","blue")) +
theme_bw()
Plot using colour
and fill
aesthetics
Use the fill
aesthetics for points and the colour
aesthetics for lines (specify aesthetics in geom_*
to prevent them being inherited). This will reproduce the problem.
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender)) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
To fix this, change the shape
argument in geom_point
to a point shape that can be filled (21:25).
ggplot(data, aes(x = x, y = tc)) +
geom_point(aes(fill = gender), shape = 21) +
geom_line(data = fitted.male, aes(colour = gender)) +
geom_line(data = fitted.female, aes(colour = gender)) +
scale_colour_manual(values = c("tomato","blue")) +
scale_fill_manual(values = c("tomato","blue")) +
theme_bw()
Created on 2021-09-19 by the reprex package (v2.0.1)
Note that the scales for colour and fill get merged automatically if the same variable is mapped to both aesthetics.
Upvotes: 4
Reputation: 24878
It seems to me that what you really want to do is use ggplot2::stat_smooth
instead of trying to predict yourself.
Borrowing the data from @scrameri:
ggplot(data, aes(x = x, y = tc, color = gender)) +
geom_point() +
stat_smooth(aes(linetype = "X^2"), method = 'lm',formula = y~x + I(x^2)) +
stat_smooth(aes(linetype = "X^3"), method = 'lm',formula = y~x + I(x^2) + I(x^3)) +
scale_color_manual(values = c("darkseagreen","lightskyblue"))
Upvotes: 2