jceg316
jceg316

Reputation: 489

How to change x axis from years to months with ggplot2

I have a web visits over time chart which plots daily traffic from 2014 until now, and looks like this:

 ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
   geom_line()+
   scale_y_continuous(labels = comma)+
   ylim(0,50000)

enter image description here

As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day. However when I try this code:

 ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
   geom_line()+
   scale_y_continuous(labels = comma)+
   ylim(0,50000)+
   scale_x_date(date_breaks = "1 month", minor_breaks = "1 week", labels = date_format("%B"))

I get this error:

Error: Invalid input: date_trans works with objects of class Date only

The date field Post_Day is POSIXct. Page_Views is numeric. Data looks like:

Post_Title  Post_Day    Page_Views
Title 1     2016-05-15  139
Title 2     2016-05-15  61
Title 3     2016-05-15  79
Title 4     2016-05-16  125
Title 5     2016-05-17  374
Title 6     2016-05-17  39
Title 7     2016-05-17  464
Title 8     2016-05-17  319
Title 9     2016-05-18  84
Title 10    2016-05-18  64
Title 11    2016-05-19  433
Title 12    2016-05-19  418
Title 13    2016-05-19  124
Title 14    2016-05-19  422

I'm looking to change the X axis from a daily granularity into monthly.

Upvotes: 0

Views: 2607

Answers (2)

Uwe
Uwe

Reputation: 42544

The sample data set shown in the question has multiple data points per day. So, it needs to be aggregated day-wise anyway. For the aggregation by day or month, data.table and lubridate are used.

Create sample data

As no reproducible example is supplied, a sample data set is created:

library(data.table)
n_rows <- 5000L
n_days <- 365L*3L
set.seed(123L)
DT <- data.table(Post_Title = paste("Title", 1:n_rows),
                 Post_Day = as.Date("2014-01-01") + sample(0:n_days, n_rows, replace = TRUE),
                 Page_Views = round(abs(rnorm(n_rows, 500, 200))))[order(Post_Day)]
DT
      Post_Title   Post_Day Page_Views
   1:   Title 74 2014-01-01        536
   2:  Title 478 2014-01-01        465
   3: Title 3934 2014-01-01        289
   4: Title 4136 2014-01-01        555
   5:  Title 740 2014-01-02        442
  ---                                 
4996: Title 1478 2016-12-31        586
4997: Title 2251 2016-12-31        467
4998: Title 2647 2016-12-31        468
4999: Title 3243 2016-12-31        498
5000: Title 4302 2016-12-31        309

Plot raw data

Without aggregation the data can be plotted by

library(ggplot2)
ggplot(DT) + aes(Post_Day, Page_Views) + geom_line()

enter image description here

Aggregated by day

ggplot(DT[, .(Page_Views = sum(Page_Views)), by = Post_Day]) + 
  aes(Post_Day, Page_Views) + geom_line()

To aggregate day-wise the grouping parameter by of data.table is used and sum() as aggregation function. The aggregation is reducing the number of data points from 5000 to 1087. Hence, the plot looks less convoluted.

enter image description here

Aggregated by month

ggplot(DT[, .(Page_Views = sum(Page_Views)), 
          by = .(Post_Month = lubridate::floor_date(Post_Day, "month"))]) + 
  aes(Post_Month, Page_Views) + geom_line()

In order to aggregate by month, the grouping parameter by is used but this time Post_Day is mapped to the first day of the respective months. So, 2014-03-26 becomes a Post_Month of 2014-03-01 which is still of class POSIXct. By this, the x-axis remains continuous with a date scale. This avoids the trouble when converting Post_Day to factor, e.g, "2014-03" using format(Post_Day, ""%Y-%m"), where the x-axis would become discrete.

enter image description here

Upvotes: 1

Kalees Waran
Kalees Waran

Reputation: 659

APRA$month <- as.factor(stftime(APRA$Post_Day, "%m")
APRA       <- APRA[order(as.numeric(APRA$month)),]

This would create a month column to your data

z <- apply(split(APRA, APRA$month), function(x) {sum(as.numeric(APRA$Page_Views))})
z <- do.call(rbind, z)
z$month <- unique(APRA$month)
colnames(Z) <- c("Page_Views", "month")

This would create a z dataframe which has months and page views each month

Now plot it

ggplot(z, aes(x = month, y = Page_Views)) + geom_line()

Please let me know if this is what you were looking for. Also I haven't compiled it, please tell if it throws some error.

Upvotes: 0

Related Questions