Reputation: 3663
I have 1417 days of sale data from 2012-01-01 to present (2015-11-20). I can't figure out how to have a single-year (Jan 1 - Dec 31) axis and each year's sales on the same, one year-long window, even when using ggplot's color = as.factor(Year)
option.
Total sales are type int
head(df$Total.Sales)
[1] 495 699 911 846 824 949
and I have used the lubridate
package to pull Year out of the original Day variable.
df$Day <- as.Date(as.numeric(df$Day), origin="1899-12-30")
df$Year <- year(df$Day)
But because Day contains the year information
sample(df$Day, 1)
[1] "2012-05-05"
ggplot is still graphing three years instead of synchronizing them to the same period of time (one, full year):
g <- ggplot(df, aes(x = Day, y = Total.Sales, color = as.factor(Year))) +
geom_line()
Upvotes: 1
Views: 873
Reputation: 15937
I create some sample data as follows
set.seed(1234)
dates <- seq(as.Date("2012-01-01"), as.Date("2015-11-20"), by = "1 day")
values <- sample(1:6000, size = length(dates))
data <- data.frame(date = dates, value = values)
Providing something of the sort is, by the way, what is meant by a reproducible example.
Then I prepare some additional columns
library(lubridate)
data$year <- year(data$date)
data$day_of_year <- as.Date(paste("2012",
month(data$date),mday(data$date), sep = "-"))
The last line is almost certainly what Roland meant in his comment. And he was right to choose the leap year, because it contains all possible dates. A normal year would miss February 29th.
Now the plot is generated by
library(ggplot2)
library(scales)
g <- ggplot(data, aes(x = day_of_year, y = value, color = as.factor(year))) +
geom_line() + scale_x_date(labels = date_format("%m/%d"))
I call scale_x_date
to define x-axis labels without the year. This relies on the function date_format
from the package scales
. The string "%m/%d"
defines the date format. If you want to know more about these format strings, use ?strptime
.
The figure looks as follows:
You can see immediately what might be the trouble with this representation. It is hard to distinguish anything on this plot. But of course this is also related to the fact that my sample data is wildly varying. Your data might look different. Otherwise, consider using faceting (see ?facet_grid
or ?facet_wrap
).
Upvotes: 2