Reputation: 11

Determing slope of multiple lines in the same geom_point graph in ggplot2

Columns and first rows of code I have several different geom_smooth(method="glm") lines in the same geom_point graph in ggplot2. I'm looking to determine the regression equation for each line, including the slope equation. I found a similar post but I'm still having some problems. My code is:

native <- read.csv("native.gather.C4C5C6C7.csv")

ggplot(native, aes(x=YearsPostRelease, y=PercentNative, col=FieldType, linetype=FieldType)) + 
    geom_point(size=0.7) + 
    geom_smooth(data = native, 
                method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
    scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
    scale_y_continuous(limits = c(0, 100), 
                       breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) + 
    ggtitle("Percent Native Through Time")

Thanks in advance!

Upvotes: 1

Answers (2)

Jon Spring

Reputation: 66570

Here's an approach using lm_eqn as defined here. You probably experienced issues because your data don't match the expected input of the function. I used mtcars here since I don't have your data, exploring the relationship between mpg and wt between cyl groups. Below, note the customization of the relationship I am investigating.

lm_eqn <- function(df){
  m <- lm(mpg ~ wt, df);
  eq <- substitute(italic(mpg) == a + b %.% italic(wt)*","~~italic(r)^2~"="~r2, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));                 
}

We can apply that to manually defined subsets of the data. There's probably a smarter way to apply this to multiple groups more automatically, but since its hard to automate smart label locations, this might be good enough.

library(ggplot2); library(dplyr)
ggplot(mtcars, aes(x=wt, y=mpg, 
                   col=as.factor(cyl), linetype=as.factor(cyl))) + 
  geom_point() + 
  geom_smooth(data = mtcars, 
              method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
  annotate("text", x = 3, y = 30, label = lm_eqn(mtcars %>% filter(cyl == 4)), parse = TRUE) +
  annotate("text", x = 4.3, y = 20, label = lm_eqn(mtcars %>% filter(cyl == 6)), parse = TRUE) +
  annotate("text", x = 4, y = 12, label = lm_eqn(mtcars %>% filter(cyl == 8)), parse = TRUE)

Upvotes: 4

OTStats

Reputation: 1868

Applying what Jon contributed above, you can customize this function to your data as follows.

Again it's difficult to know completely what your underlying data look like, but let's assume that your field, FieldType, contains three factors: BSSFields, CSSFields, DSSFields.

# Load data
library(tidyverse)
native <- read.csv("native.gather.C4C5C6C7.csv")

# Define function
lm_eqn <- function(df){
  m <- lm(PercentNative ~ YearsPostRelease, df);
  eq <- substitute(italic(native) == a + b %.% 
italic(YearsPostRelease)*","~~italic(r)^2~"="~r2, 
                   list(a = format(coef(m)[1], digits = 2), 
                        b = format(coef(m)[2], digits = 2), 
                        r2 = format(summary(m)$r.squared, digits = 3)))
  as.character(as.expression(eq));                 
}

# Plot data
ggplot(native, aes(x = YearsPostRelease, 
                   y = PercentNative, 
                   col = FieldType, 
                   linetype = FieldType)) +
  geom_point(size=0.7) + 
  geom_smooth(data = native, 
              method ="glm", alpha = 0, show.legend = FALSE, linetype = 'solid') +
  scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55)) +
  scale_y_continuous(limits = c(0, 100), 
                     breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) + 
  annotate("text", x = 3, y = 30, 
           label = lm_eqn(native %>% filter(FieldType == "BSSFields")), parse = TRUE) +
  annotate("text", x = 4, y = 20, 
           label = lm_eqn(native %>% filter(FieldType == "CSSFields")), parse = TRUE) +
  annotate("text", x = 5, y = 10, 
           label = lm_eqn(native %>% filter(FieldType == "DSSFields")), parse = TRUE)
  ggtitle("Percent Native Through Time")

It's important to note that the location of these regressions equations will have be modified based on the range of YearsPostRelease and PercentNative. Also, if FieldTypes contain more than three levels, you'll have to add corresponding annotate() calls, customized to the level name.

Upvotes: 0

Determing slope of multiple lines in the same geom_point graph in ggplot2

Answers (2)

Related Questions