Reputation: 489
I have a web visits over time chart which plots daily traffic from 2014 until now, and looks like this:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)
As you can see it's not a great graph, what would make a bit more sense is to break it down by month as opposed to day. However when I try this code:
ggplot(subset(APRA, Post_Day > "2013-12-31"), aes(x = Post_Day, y = Page_Views))+
geom_line()+
scale_y_continuous(labels = comma)+
ylim(0,50000)+
scale_x_date(date_breaks = "1 month", minor_breaks = "1 week", labels = date_format("%B"))
I get this error:
Error: Invalid input: date_trans works with objects of class Date only
The date field Post_Day
is POSIXct
. Page_Views
is numeric. Data looks like:
Post_Title Post_Day Page_Views
Title 1 2016-05-15 139
Title 2 2016-05-15 61
Title 3 2016-05-15 79
Title 4 2016-05-16 125
Title 5 2016-05-17 374
Title 6 2016-05-17 39
Title 7 2016-05-17 464
Title 8 2016-05-17 319
Title 9 2016-05-18 84
Title 10 2016-05-18 64
Title 11 2016-05-19 433
Title 12 2016-05-19 418
Title 13 2016-05-19 124
Title 14 2016-05-19 422
I'm looking to change the X axis from a daily granularity into monthly.
Upvotes: 0
Views: 2607
Reputation: 42544
The sample data set shown in the question has multiple data points per day. So, it needs to be aggregated day-wise anyway. For the aggregation by day or month, data.table
and lubridate
are used.
As no reproducible example is supplied, a sample data set is created:
library(data.table)
n_rows <- 5000L
n_days <- 365L*3L
set.seed(123L)
DT <- data.table(Post_Title = paste("Title", 1:n_rows),
Post_Day = as.Date("2014-01-01") + sample(0:n_days, n_rows, replace = TRUE),
Page_Views = round(abs(rnorm(n_rows, 500, 200))))[order(Post_Day)]
DT
Post_Title Post_Day Page_Views 1: Title 74 2014-01-01 536 2: Title 478 2014-01-01 465 3: Title 3934 2014-01-01 289 4: Title 4136 2014-01-01 555 5: Title 740 2014-01-02 442 --- 4996: Title 1478 2016-12-31 586 4997: Title 2251 2016-12-31 467 4998: Title 2647 2016-12-31 468 4999: Title 3243 2016-12-31 498 5000: Title 4302 2016-12-31 309
Without aggregation the data can be plotted by
library(ggplot2)
ggplot(DT) + aes(Post_Day, Page_Views) + geom_line()
ggplot(DT[, .(Page_Views = sum(Page_Views)), by = Post_Day]) +
aes(Post_Day, Page_Views) + geom_line()
To aggregate day-wise the grouping parameter by
of data.table
is used and sum()
as aggregation function. The aggregation is reducing the number of data points from 5000 to 1087. Hence, the plot looks less convoluted.
ggplot(DT[, .(Page_Views = sum(Page_Views)),
by = .(Post_Month = lubridate::floor_date(Post_Day, "month"))]) +
aes(Post_Month, Page_Views) + geom_line()
In order to aggregate by month, the grouping parameter by
is used but this time Post_Day
is mapped to the first day of the respective months. So, 2014-03-26
becomes a Post_Month
of 2014-03-01
which is still of class POSIXct
. By this, the x-axis remains continuous with a date scale. This avoids the trouble when converting Post_Day
to factor, e.g, "2014-03"
using format(Post_Day, ""%Y-%m")
, where the x-axis would become discrete.
Upvotes: 1
Reputation: 659
APRA$month <- as.factor(stftime(APRA$Post_Day, "%m")
APRA <- APRA[order(as.numeric(APRA$month)),]
This would create a month column to your data
z <- apply(split(APRA, APRA$month), function(x) {sum(as.numeric(APRA$Page_Views))})
z <- do.call(rbind, z)
z$month <- unique(APRA$month)
colnames(Z) <- c("Page_Views", "month")
This would create a z
dataframe
which has months and page views each month
Now plot it
ggplot(z, aes(x = month, y = Page_Views)) + geom_line()
Please let me know if this is what you were looking for. Also I haven't compiled it, please tell if it throws some error.
Upvotes: 0