Reputation: 11
Columns and first rows of code
I have several different geom_smooth(method="glm")
lines in the same geom_point
graph in ggplot2
. I'm looking to determine the regression equation for each line, including the slope equation. I found a similar post but I'm still having some problems. My code is:
native <- read.csv("native.gather.C4C5C6C7.csv")
ggplot(native, aes(x=YearsPostRelease, y=PercentNative, col=FieldType, linetype=FieldType)) +
geom_point(size=0.7) +
geom_smooth(data = native,
method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
scale_y_continuous(limits = c(0, 100),
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) +
ggtitle("Percent Native Through Time")
Thanks in advance!
Upvotes: 1
Views: 1101
Reputation: 66570
Here's an approach using lm_eqn
as defined here. You probably experienced issues because your data don't match the expected input of the function. I used mtcars
here since I don't have your data, exploring the relationship between mpg and wt between cyl groups. Below, note the customization of the relationship I am investigating.
lm_eqn <- function(df){
m <- lm(mpg ~ wt, df);
eq <- substitute(italic(mpg) == a + b %.% italic(wt)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
We can apply that to manually defined subsets of the data. There's probably a smarter way to apply this to multiple groups more automatically, but since its hard to automate smart label locations, this might be good enough.
library(ggplot2); library(dplyr)
ggplot(mtcars, aes(x=wt, y=mpg,
col=as.factor(cyl), linetype=as.factor(cyl))) +
geom_point() +
geom_smooth(data = mtcars,
method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
annotate("text", x = 3, y = 30, label = lm_eqn(mtcars %>% filter(cyl == 4)), parse = TRUE) +
annotate("text", x = 4.3, y = 20, label = lm_eqn(mtcars %>% filter(cyl == 6)), parse = TRUE) +
annotate("text", x = 4, y = 12, label = lm_eqn(mtcars %>% filter(cyl == 8)), parse = TRUE)
Upvotes: 4
Reputation: 1868
Applying what Jon contributed above, you can customize this function to your data as follows.
Again it's difficult to know completely what your underlying data look like, but let's assume that your field, FieldType, contains three factors: BSSFields, CSSFields, DSSFields.
# Load data
library(tidyverse)
native <- read.csv("native.gather.C4C5C6C7.csv")
# Define function
lm_eqn <- function(df){
m <- lm(PercentNative ~ YearsPostRelease, df);
eq <- substitute(italic(native) == a + b %.%
italic(YearsPostRelease)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
# Plot data
ggplot(native, aes(x = YearsPostRelease,
y = PercentNative,
col = FieldType,
linetype = FieldType)) +
geom_point(size=0.7) +
geom_smooth(data = native,
method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
scale_y_continuous(limits = c(0, 100),
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) +
annotate("text", x = 3, y = 30,
label = lm_eqn(native %>% filter(FieldType == "BSSFields")), parse = TRUE) +
annotate("text", x = 4, y = 20,
label = lm_eqn(native %>% filter(FieldType == "CSSFields")), parse = TRUE) +
annotate("text", x = 5, y = 10,
label = lm_eqn(native %>% filter(FieldType == "DSSFields")), parse = TRUE)
ggtitle("Percent Native Through Time")
It's important to note that the location of these regressions equations will have be modified based on the range of YearsPostRelease and PercentNative. Also, if FieldTypes contain more than three levels, you'll have to add corresponding annotate()
calls, customized to the level name.
Upvotes: 0