DJC78
DJC78

Reputation: 47

Can't do a multiple line plot with ggplot2

I have a dataset with a column of unemployment, a column of months, and one for years.

I want to do a line plot where I have month number on the x axis, unemployment on the y axis and that each line represents a different year.

I first filtered the dataframe by year to have the y values for each year individually and I tried the following code:

y1 = df %>% filter(year == 1996) 
y1 = y1$unemploy
y2 = df %>% filter(year == 1997)
y2 = y2$unemploy
y3 = df %>% filter(year == 1998)
y3 = y3$unemploy 

plot1 = ggplot() +  
  geom_line(mapping = aes(x = df$month, y = y1), color = "navyblue") +
  geom_line(mapping = aes(x = df$month,y = y2), color = "black") +
  geom_line(mapping = aes(x = df$month,y = y3), color = "red") +
  scale_y_continuous(limits=c(0,10)) +
  scale_x_continuous(limits=c(1,15)) 
plot1

But when I try to print the plot, I get the following error message:

Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (128): y
Run `rlang::last_error()` to see where the error occurred.

Does anyone know what could be the problem with this plot?

The output of dput(head(df,20)) is the following:

dput(head(df, 20))
structure(list(unemploy = c(6.7, 6.7, 6.4, 5.9, 5.2, 4.8, 4.8, 
4, 4.2, 4.4, 5, 5, 6.4, 6.5, 6.3, 5.9, 4.9, 4.8, 4.5, 4), month = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L), year = c(1996L, 1996L, 1996L, 1996L, 1996L, 
1996L, 1996L, 1996L, 1996L, 1996L, 1996L, 1996L, 1997L, 1997L, 
1997L, 1997L, 1997L, 1997L, 1997L, 1997L)), row.names = c(NA, 
20L), class = "data.frame")

Upvotes: 2

Views: 71

Answers (3)

zephryl
zephryl

Reputation: 17079

Rather than filtering by year and including three separate geom_line()s, just pass the full dataframe to ggplot() and map year to the color and group aesthetics. You can specify your colors using scale_color_manual().

library(ggplot2)
library(scales)

ggplot(data = df) +
  geom_line(aes(month, unemploy, color = year, group = year)) +
  scale_color_manual(values = c("navyblue", "black", "red")) +
  scale_y_continuous(
    limits = c(0, .1),
    label = percent
  ) +
  theme_light()

Example data:

set.seed(13)
df <- expand.grid(month = factor(1:12), year = factor(1996:1998))
unemploy <- vector("double", 36)
unemploy[[1]] <- .05
for (i in 2:36) {
  unemploy[[i]] <- unemploy[[i - 1]] + rnorm(1, 0, .005)
}
df$unemploy <- unemploy

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Not assuming the years are only for years 1996 to 1998, first keep only those years, coerce them to factor and set the aesthetics in the initial call to ggplot. The aesthetic color will take care of the lines grouping.

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
})

df %>%
  filter(year %in% 1996:1998) %>%
  mutate(year = factor(year)) %>%
  ggplot(aes(x = month, y = unemploy, color = year)) +  
  geom_line(linewidth = 1) +
  scale_color_manual(values = c("navyblue", "black", "red")) +
  scale_y_continuous(limits = c(0, 10)) +
  scale_x_continuous(limits = c(1, 12), breaks = 1:12) +
  labs(x = "Month", y = "Unemployment") +
  theme_bw()

Created on 2022-11-14 with reprex v2.0.2


Data creation code

The data creation code is borrowed from zephryl's answer with a few changes:

  • The years are extended so to have later filter actually keep only the years in the question;
  • month and factor are numeric, not factors;
  • there is no need for a for loop, unemploy is created with cumsum and its values are consistent with the y axis limits;
  • there is no need to create an extra variable, unemploy, and then assign it to the data.frame.
set.seed(13)
df <- expand.grid(month = 1:12, year = 1995:1999)
df$unemploy <- 8 + cumsum(rnorm(nrow(df), sd = 0.25))

Created on 2022-11-14 with reprex v2.0.2

Upvotes: 0

M--
M--

Reputation: 28850

For the method you are using, you need to use a different dataset for each geom; you assign all the rows for variable month in your df to x and only a subset of the unemploy column to y, hence, different number of x and y entities in ggplot returns an error.

y1 = df %>% filter(year == 1996) 

y2 = df %>% filter(year == 1997)

y3 = df %>% filter(year == 1998)


plot1 = ggplot() +  
  geom_line(y1, aes(x = month, y = unemploy), color = "navyblue") +
  geom_line(y2, aes(x = month, y = unemploy), color = "black") +
  geom_line(y3, aes(x = month, y = unemploy), color = "red") +
  scale_y_continuous(limits=c(0,10)) +
  scale_x_continuous(limits=c(1,15)) 
plot1

But better practice is using color within mapping:

df %>%
  filter(year %in% c("1996", "1997", "1998")) %>%
ggplot() +  
  geom_line(aes(x = month, y = unemploy, color = year)) +
  scale_y_continuous(limits=c(0,10)) +
  scale_x_continuous(limits=c(1,15)) 

You can use scale_color_manual later, if you want those specific colors and don't like the default ggplot colors.

Upvotes: 2

Related Questions