Reputation: 117
I have a dataframe with three columns, call it (X,Y,Z). Such that:
I want to plot (using ggplot2) Y againts X and make color groups based on the factor variable Z. This I have managed!
Now I need to plot some regression lines, I know how to plot a regression line for each set of points belonging to the same category (i.e. same factor variable Z). However what I need is to plot TWO regression lines for each category (might seem weird but in the problem I am dealing with it is the way is always done). So, for each category (Z) I should have a regression line computed from the first n elements (belonging to that category) and a second regression line made from the remaining points in the given category, of course both of these lines should have the same color as they interpolate points in a given category (i.e. same color group).
Any help is very much appreciated! Thank you in advance
Upvotes: 0
Views: 599
Reputation: 460
If the two ranges of x that you want to are independent and you want to generate 4 separate regression lines (as is my understanding of your question) then you can specify the data to use in 2 calls to geom_smooth()
.
Here, head()
and tail()
are indicating which values of x you want to regress on, assuming the points are ordered in df. If they are not ordered, you will need to do that first (e.g. using a call to arrange()
by values on the x-axis).
library(tidyverse)
# some test data with 3 variables: a random response (y), a sequence (x), and a factor (z).
df<-tibble(x = seq(0.5, 25, 0.5),
y = rnorm(50),
z = sample(x = c("A", "B"), replace = T, size = 50))
# a plot with each factor of z coloured and 2 regression lines for each factor
ggplot(df, aes(x, y, colour = z))+
geom_point()+
geom_smooth(data = ~head(df, 30), method = "lm", se = F)+
geom_smooth(data = ~tail(df ,20), method = "lm", se = F)+
theme_minimal()
Upvotes: 1