socialresearcher
socialresearcher

Reputation: 67

Using ggplot in R to create a line graph for two different groups

I'm trying to create a line graph depicting different trajectories over time for two groups/conditions. I have two groups for which the data 'eat' was collected at five time points (1,2,3,4,5). I'd like the lines to connect the mean point for each group at each of five time points, so I'd have two points at Time 1, two points at Time 2, and so on.

Here's a reproducible example:

#Example data
library(tidyverse)
library(ggplot2)
eat <- sample(1:7, size = 30, replace = TRUE)
df <- data.frame(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),
                 Condition = rep(c(0, 1), each = 15),
                 time = c(1, 2, 3, 4, 5),
                 eat = eat
)
df$time <- as.factor(df$time)
df$Condition <- as.factor(df$Condition)

#Create the plot.
library(ggplot2)
ggplot(df, aes(x = time, y = eat, fill = Condition)) + geom_line() +
  geom_point(size = 4, shape = 21) +
  stat_summary(fun.y = mean, colour = "red", geom = "line")

The problem is, I need my lines to go horizontally (ie to show two different colored lines moving across the x-axis). But this code just connects the dots vertically:

Here's what it looks like:

If I don't convert Time to a factor, but only convert Condition to a factor, I get a mess of lines. The same thing happens in my actual data, as well.

example graph result

I'd like it to look like this like this aesthetically, with the transparent error envelopes wrapping each line. However, I don't want it to be curvy, I want the lines to be straight, connecting the means at each point.

Upvotes: 2

Views: 3360

Answers (3)

Calum You
Calum You

Reputation: 15062

Here's the lines running in straight segments through the means of each time, with the range set to be the standard deviation of the points at the time. One stat.summary makes the mean line with the colour aesthetic, the other makes the area using the inherited fill aesthetic. ggplot2::mean_se is a convenient function that takes a vector and returns a data frame with the mean and +/- some number of standard errors. This is the right format for thefun.data argument to stat_summary, which passes these values to the geom specified. Here, geom_ribbon accepts ymin and ymax values to plot a ribbon across the graph.

library(tidyverse)
set.seed(12345)
eat <- sample(1:7, size = 30, replace = T)
df <- data.frame(
  Condition = rep(c(0, 1), each = 15),
  time = c(1, 2, 3, 4, 5),
  eat = eat
)
df$Condition <- as.factor(df$Condition)

ggplot(df, aes(x = time, y = eat, fill = Condition)) +
  geom_point(size = 4, shape = 21, colour = "black") +
  stat_summary(geom = "ribbon", fun.data = mean_se, alpha = 0.2) +
  stat_summary(
    mapping = aes(colour = Condition),
    geom = "line",
    fun.y = mean,
    show.legend = FALSE
    )

Created on 2018-07-09 by the reprex package (v0.2.0).

Upvotes: 3

Robert Kahne
Robert Kahne

Reputation: 174

I think this code will get you most of the way there

  library(tidyverse)

  eat <- sample(1:7, size = 30, replace = TRUE)  
  tibble(id = rep(c(1, 2, 3, 4, 5, 6), each = 5),             
         Condition = factor(rep(c(0, 1), each = 15)),
         time = factor(rep(c(1, 2, 3, 4, 5), 6)),
         eat = eat) %>%
  ggplot(aes(x = time, y = eat, fill = Condition, group = Condition)) +
  geom_point(size = 4, shape = 21) +
  geom_smooth()

geom_smooth is what you were looking for, I think. This creates a linear model out of the points, and as long as your x value is a factor, it should use the mean and connect the points that way.

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145755

Here's my best guess at what you want:

# keep time as numeric
df$time = as.numeric(as.character(df$time))
ggplot(df, aes(x = time, y = eat, group = Condition)) +
    geom_smooth(
        aes(fill = Condition, linetype = Condition),
        method = "lm",
        level = 0.65,
        color = "black",
        size = 0.3
    ) +
    geom_point(aes(color = Condition))

enter image description here

Setting the level = 0.65 is about +/- 1 standard deviation on the linear model fit.

Upvotes: 1

Related Questions