Reputation: 47
I have a dataset with a column of unemployment, a column of months, and one for years.
I want to do a line plot where I have month number on the x axis, unemployment on the y axis and that each line represents a different year.
I first filtered the dataframe by year to have the y values for each year individually and I tried the following code:
y1 = df %>% filter(year == 1996)
y1 = y1$unemploy
y2 = df %>% filter(year == 1997)
y2 = y2$unemploy
y3 = df %>% filter(year == 1998)
y3 = y3$unemploy
plot1 = ggplot() +
geom_line(mapping = aes(x = df$month, y = y1), color = "navyblue") +
geom_line(mapping = aes(x = df$month,y = y2), color = "black") +
geom_line(mapping = aes(x = df$month,y = y3), color = "red") +
scale_y_continuous(limits=c(0,10)) +
scale_x_continuous(limits=c(1,15))
plot1
But when I try to print the plot, I get the following error message:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (128): y
Run `rlang::last_error()` to see where the error occurred.
Does anyone know what could be the problem with this plot?
The output of dput(head(df,20))
is the following:
dput(head(df, 20))
structure(list(unemploy = c(6.7, 6.7, 6.4, 5.9, 5.2, 4.8, 4.8,
4, 4.2, 4.4, 5, 5, 6.4, 6.5, 6.3, 5.9, 4.9, 4.8, 4.5, 4), month = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), year = c(1996L, 1996L, 1996L, 1996L, 1996L,
1996L, 1996L, 1996L, 1996L, 1996L, 1996L, 1996L, 1997L, 1997L,
1997L, 1997L, 1997L, 1997L, 1997L, 1997L)), row.names = c(NA,
20L), class = "data.frame")
Upvotes: 2
Views: 71
Reputation: 17079
Rather than filtering by year and including three separate geom_line()
s, just pass the full dataframe to ggplot()
and map year
to the color
and group
aesthetics. You can specify your colors using scale_color_manual()
.
library(ggplot2)
library(scales)
ggplot(data = df) +
geom_line(aes(month, unemploy, color = year, group = year)) +
scale_color_manual(values = c("navyblue", "black", "red")) +
scale_y_continuous(
limits = c(0, .1),
label = percent
) +
theme_light()
Example data:
set.seed(13)
df <- expand.grid(month = factor(1:12), year = factor(1996:1998))
unemploy <- vector("double", 36)
unemploy[[1]] <- .05
for (i in 2:36) {
unemploy[[i]] <- unemploy[[i - 1]] + rnorm(1, 0, .005)
}
df$unemploy <- unemploy
Upvotes: 1
Reputation: 76402
Not assuming the years are only for years 1996 to 1998, first keep only those years, coerce them to factor and set the aesthetics in the initial call to ggplot
. The aesthetic color
will take care of the lines grouping.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
df %>%
filter(year %in% 1996:1998) %>%
mutate(year = factor(year)) %>%
ggplot(aes(x = month, y = unemploy, color = year)) +
geom_line(linewidth = 1) +
scale_color_manual(values = c("navyblue", "black", "red")) +
scale_y_continuous(limits = c(0, 10)) +
scale_x_continuous(limits = c(1, 12), breaks = 1:12) +
labs(x = "Month", y = "Unemployment") +
theme_bw()
Created on 2022-11-14 with reprex v2.0.2
The data creation code is borrowed from zephryl's answer with a few changes:
filter
actually keep only the years in the question;month
and factor
are numeric, not factors;for
loop, unemploy
is created with cumsum
and its values are consistent with the y axis limits;unemploy
, and then assign it to the data.frame.set.seed(13)
df <- expand.grid(month = 1:12, year = 1995:1999)
df$unemploy <- 8 + cumsum(rnorm(nrow(df), sd = 0.25))
Created on 2022-11-14 with reprex v2.0.2
Upvotes: 0
Reputation: 28850
For the method you are using, you need to use a different dataset for each geom; you assign all the rows for variable month
in your df to x
and only a subset of the unemploy column to y
, hence, different number of x and y entities in ggplot returns an error.
y1 = df %>% filter(year == 1996)
y2 = df %>% filter(year == 1997)
y3 = df %>% filter(year == 1998)
plot1 = ggplot() +
geom_line(y1, aes(x = month, y = unemploy), color = "navyblue") +
geom_line(y2, aes(x = month, y = unemploy), color = "black") +
geom_line(y3, aes(x = month, y = unemploy), color = "red") +
scale_y_continuous(limits=c(0,10)) +
scale_x_continuous(limits=c(1,15))
plot1
But better practice is using color
within mapping:
df %>%
filter(year %in% c("1996", "1997", "1998")) %>%
ggplot() +
geom_line(aes(x = month, y = unemploy, color = year)) +
scale_y_continuous(limits=c(0,10)) +
scale_x_continuous(limits=c(1,15))
You can use scale_color_manual
later, if you want those specific colors and don't like the default ggplot colors.
Upvotes: 2