d8aninja
d8aninja

Reputation: 3663

Synchronous X-Axis For Multiple Years of Sales with ggplot

I have 1417 days of sale data from 2012-01-01 to present (2015-11-20). I can't figure out how to have a single-year (Jan 1 - Dec 31) axis and each year's sales on the same, one year-long window, even when using ggplot's color = as.factor(Year) option.

Total sales are type int

head(df$Total.Sales)
[1] 495 699 911 846 824 949

and I have used the lubridate package to pull Year out of the original Day variable.

df$Day <- as.Date(as.numeric(df$Day), origin="1899-12-30") 
df$Year <- year(df$Day)

But because Day contains the year information

sample(df$Day, 1)
[1] "2012-05-05"

ggplot is still graphing three years instead of synchronizing them to the same period of time (one, full year):

g <- ggplot(df, aes(x = Day, y = Total.Sales, color = as.factor(Year))) +
        geom_line()

enter image description here

Upvotes: 1

Views: 873

Answers (1)

Stibu
Stibu

Reputation: 15937

I create some sample data as follows

set.seed(1234)
dates <- seq(as.Date("2012-01-01"), as.Date("2015-11-20"), by = "1 day")
values <- sample(1:6000, size = length(dates))
data <- data.frame(date = dates, value = values)

Providing something of the sort is, by the way, what is meant by a reproducible example.

Then I prepare some additional columns

library(lubridate)
data$year <- year(data$date)
data$day_of_year <- as.Date(paste("2012",
                    month(data$date),mday(data$date), sep = "-"))

The last line is almost certainly what Roland meant in his comment. And he was right to choose the leap year, because it contains all possible dates. A normal year would miss February 29th.

Now the plot is generated by

library(ggplot2)
library(scales)
g <- ggplot(data, aes(x = day_of_year, y = value, color = as.factor(year))) +
   geom_line() + scale_x_date(labels = date_format("%m/%d"))

I call scale_x_date to define x-axis labels without the year. This relies on the function date_format from the package scales. The string "%m/%d" defines the date format. If you want to know more about these format strings, use ?strptime.

The figure looks as follows:

enter image description here

You can see immediately what might be the trouble with this representation. It is hard to distinguish anything on this plot. But of course this is also related to the fact that my sample data is wildly varying. Your data might look different. Otherwise, consider using faceting (see ?facet_grid or ?facet_wrap).

Upvotes: 2

Related Questions