Reputation: 1
I need to make a line plot in ggplot2 using a data frame called df that looks something like this:
DATE ITEM NUMBER_SOLD
<date> <chr> <int>
1 2018-01-08 APPLE 3
2 2018-01-09 APPLE 3
3 2018-01-09 PEAR 2
4 2018-01-09 ORANGE 1
5 2018-01-10 APPLE 2
6 2018-01-10 PEAR 1
7 2018-01-12 CHERRY 2
8 2018-01-12 MANGO 1
9 2018-01-15 PINEAPPLE 1
10 2018-01-15 APRICOT 1
etc
The data frame is basically a tibble showing how many times a particular item was sold on a given day in 2018 with a total of 336 rows.
The plot needs to be a line plot showing the sale of one particular item (apple) with the date on the x axis, number sold on the y axis and an additional line on the y axis showing a 15% increase in sales like this:
df %>% filter(ITEM == "APPLE") %>%
ggplot(aes(DATE, NUMBER_SOLD)) +
geom_line(size = 1, col = "red") +
theme(axis.text.x = element_text(angle = 90)) +
geom_line(aes(y = NUMBER_SOLD + NUMBER_SOLD/100*15), col = "green4", size = 1, alpha = 0.6) +
scale_x_date(date_labels="%b", date_breaks = "1 month")
However, I would also need to add a legend to show what both lines represent, e.g. red colored line representing the original number of sales and the green one representing the original number of sales + 15%. How might I achieve that?
Upvotes: 0
Views: 61
Reputation: 8107
The trick is to do the calculation in the data frame first, then use gather()
to turn the data to long and have the numbers into one column with another variable indicating whether each number is for actual or expected sale.
library(tidyverse)
df <- tribble(~"DATE", ~"ITEM", ~"NUMBER_SOLD",
"2018-01-08", "APPLE", 3,
"2018-01-09", "APPLE", 3,
"2018-01-09", "PEAR", 2,
"2018-01-09", "ORANGE", 1,
"2018-01-10", "APPLE", 2,
"2018-01-10", "PEAR", 1,
"2018-01-12", "CHERRY", 2,
"2018-01-12", "MANGO", 1,
"2018-01-15", "PINEAPPLE", 1,
"2018-01-15", "APRICOT", 1) %>%
mutate(DATE = parse_date(DATE),
NUMBER_SOLD_EXP = NUMBER_SOLD + NUMBER_SOLD/100*15) %>%
gather(key = category, value = SOLD, NUMBER_SOLD, NUMBER_SOLD_EXP)
df
# A tibble: 20 x 4
DATE ITEM category SOLD
<date> <chr> <chr> <dbl>
1 2018-01-08 APPLE NUMBER_SOLD 3
2 2018-01-09 APPLE NUMBER_SOLD 3
3 2018-01-09 PEAR NUMBER_SOLD 2
4 2018-01-09 ORANGE NUMBER_SOLD 1
5 2018-01-10 APPLE NUMBER_SOLD 2
6 2018-01-10 PEAR NUMBER_SOLD 1
7 2018-01-12 CHERRY NUMBER_SOLD 2
8 2018-01-12 MANGO NUMBER_SOLD 1
9 2018-01-15 PINEAPPLE NUMBER_SOLD 1
10 2018-01-15 APRICOT NUMBER_SOLD 1
11 2018-01-08 APPLE NUMBER_SOLD_EXP 3.45
12 2018-01-09 APPLE NUMBER_SOLD_EXP 3.45
13 2018-01-09 PEAR NUMBER_SOLD_EXP 2.3
14 2018-01-09 ORANGE NUMBER_SOLD_EXP 1.15
15 2018-01-10 APPLE NUMBER_SOLD_EXP 2.3
16 2018-01-10 PEAR NUMBER_SOLD_EXP 1.15
17 2018-01-12 CHERRY NUMBER_SOLD_EXP 2.3
18 2018-01-12 MANGO NUMBER_SOLD_EXP 1.15
19 2018-01-15 PINEAPPLE NUMBER_SOLD_EXP 1.15
20 2018-01-15 APRICOT NUMBER_SOLD_EXP 1.15
Now you just need to call geom_line
once, using the colour argument on the variable indicating whether the number is actual or expected sold. You'll need to add scale_colour_manual()
to specify what colour you want to attach to the categories.
df %>% filter(ITEM == "APPLE") %>%
ggplot(aes(DATE, SOLD)) +
geom_line(aes(colour = category), size = 1) +
scale_colour_manual(values = c("NUMBER_SOLD" = "red", "NUMBER_SOLD_EXP" = "green")) +
theme(axis.text.x = element_text(angle = 90)) +
scale_x_date(date_labels="%b", date_breaks = "1 month")
Upvotes: 2