Brad
Brad

Reputation: 29

ggplot2: Reshaping data to plot multiple Y values for each X Value

I have a data frame which contains 2 weeks of data that indicate how many passengers have been on a train each day. Each observation contains 3 values, the date, the number of passengers, and the day of the week. I want to compare the passengers on each day from the previous week to this week (Monday to Monday, Tusday to Tuesday etc). Here is the data:

structure(list(total = structure(c(17455, 17456, 17457, 17458, 
17459, 17460, 17461, 17462, 17463, 17464, 17465, 17466, 17467, 
17468), class = "Date"), passengers = c(9299L, 9166L, 10234L, 
10176L, 10098L, 2867L, 5416L, 9312L, 10555L, 10858L, 10169L, 
9515L, 2679L, 5490L), dow = c("Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Saturday", "Sunday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), .Names = 
c("total", "passengers", "dow"), class = "data.frame")

(The automated system that created the reports used the term "total" for dates, I felt the need to point that out as it might be confusing).

When I create a ggplot, it only maps 1 y value for a bar chart instead of 2 side by side:

ggplot(x, aes(x=dow, y=passengers), fill=variable) + 
  geom_bar(stat = "identity", position = "dodge")

I have seen reshape used to melt the data for instances such as this, but when I melt using the day of the week as the id.vars value, the date is converted to scientific notation (small problem) but ggplot cannot find the passengers variable (big problem).

Upvotes: 0

Views: 1876

Answers (1)

Z.Lin
Z.Lin

Reputation: 29125

Some issues to be addressed:

  1. you specified fill = variable, but there's no variable named "variable" in your data frame;
  2. you expect the 2 dodged bars side by side, but there's no indication how the dodging is to be done.

I would wrangle the data frame first:

library(dplyr)

df <- x %>%
  mutate(week = format(total, "%V"),
         dow = factor(dow, levels = c("Monday", "Tuesday", "Wednesday", "Thursday",
                                      "Friday", "Saturday", "Sunday")))

> head(df)
       total passengers       dow week
1 2017-10-16       9299    Monday   42
2 2017-10-17       9166   Tuesday   42
3 2017-10-18      10234 Wednesday   42
4 2017-10-19      10176  Thursday   42
5 2017-10-20      10098    Friday   42
6 2017-10-21       2867  Saturday   42

This adds a "week" variable, which takes on the value 42 for the first 7 values, and 43 for the next 7. the days of the week are also now ordered from Mon to Sun.

ggplot(df, 
       aes(x = dow, y = passengers, fill = week)) + 
  geom_col(position = "dodge")

geom_col() is equivalent to geom_bar(stat = "identity"), but requires less typing.

plot

Upvotes: 2

Related Questions