Reputation: 51
I'm trying to create a monthly time series in ggplot for time series analysis. This is my data:
rdata1 <- read_table2("date sales_revenue_incl_credit 2017-07 56,037.46 2017-08 38333.9 2017-09 48716.92 2017-10 65447.67 2017-11 134752.57 2017-12 116477.39 2018-01 78167.25 2018-02 75991.44 2018-03 42520.93 2018-04 70489.92 2018-05 121063.35 2018-06 76308.47 2018-07 118085.7 2018-08 96153.38 2018-09 82827.1 2018-10 109288.83 2018-11 145774.52 2018-12 141572.77 2019-01 123055.83 2019-02 104232.24 2019-03 435086.33 2019-04 74304.96 2019-05 117237.82 2019-06 82013.47 2019-07 99382.67 2019-08 138455.2 2019-09 97301.99 2019-10 137206.09 2019-11 109862.44 2019-12 118150.96 2020-01 140717.9 2020-02 127622.3 2020-03 134126.09")
I now use the below code to change the class of date and then plot with breaks and labels much easier using date_labels and date_breaks.
rdata1 %>% mutate(date = ymd(date)) %>% ggplot(aes(date, sales_revenue_incl_credit)) + geom_line() + scale_x_date(date_labels = "%b %Y", date_breaks = "1 month")+ theme_bw()+ theme(axis.text.x = element_text(angle = 90, vjust=0.5), panel.grid.minor = element_blank())
I get the following error:
Error in seq.int(r1$mon, 12 * (to0$year - r1$year) + to0$mon, by) : 'from' must be a finite number
Upvotes: 0
Views: 421
Reputation: 31800
A simpler version of @Tom's answer is to use a tsibble object and the feasts
package:
# Loading the required libraries
library(tibble)
library(dplyr)
library(ggplot2)
library(lubridate)
library(tsibble)
library(feasts)
# Data preparation
df <- tribble(
~date, ~sales_revenue_incl_credit,
"2017-07", 56037.46,
"2017-08", 38333.9,
"2017-09", 48716.92,
"2017-10", 65447.67,
"2017-11", 134752.57,
"2017-12", 116477.39,
"2018-01", 78167.25,
"2018-02", 75991.44,
"2018-03", 42520.93,
"2018-04", 70489.92,
"2018-05", 121063.35,
"2018-06", 76308.47,
"2018-07", 118085.7,
"2018-08", 96153.38,
"2018-09", 82827.1,
"2018-10", 109288.83,
"2018-11", 145774.52,
"2018-12", 141572.77,
"2019-01", 123055.83,
"2019-02", 104232.24,
"2019-03", 435086.33,
"2019-04", 74304.96,
"2019-05", 117237.82,
"2019-06", 82013.47,
"2019-07", 99382.67,
"2019-08", 138455.2,
"2019-09", 97301.99,
"2019-10", 137206.09,
"2019-11", 109862.44,
"2019-12", 118150.96,
"2020-01", 140717.9,
"2020-02", 127622.3,
"2020-03", 134126.09
) %>%
mutate(date = yearmonth(date)) %>%
as_tsibble(index=date)
# Reproducing your plot
df %>% autoplot(sales_revenue_incl_credit) +
scale_x_yearmonth(breaks=seq(1e3)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
panel.grid.minor = element_blank())
Created on 2020-06-19 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 591
Putting all these concerns together, I performed some data preparation to obtain your desired output. First, as noted in the comments, I appended the first day of the month to each "year-month" so you can work with a proper date variable in R. Next, I used the column_to_rownames()
function on the month_year
column. I appended the year to the month name because duplicate (non-unique) row names are not permitted. I should caution you against using row labels. Quoting from the documentation (see ?tibble::rownames_to_column
):
While a tibble can have row names (e.g., when converting from a regular data frame), they are removed when subsetting with the [ operator. A warning will be raised when attempting to assign non-NULL row names to a tibble. Generally, it is best to avoid row names, because they are basically a character column with different semantics than every other column.
You can manipulate the row names below with different naming conventions. Just make sure the labels are unique! See the R code below:
# Loading the required libraries
library(tibble)
library(ggplot2)
library(dplyr)
library(lubridate)
df <- tribble(
~date, ~sales_revenue_incl_credit,
"2017-07", 56037.46,
"2017-08", 38333.9,
"2017-09", 48716.92,
"2017-10", 65447.67,
"2017-11", 134752.57,
"2017-12", 116477.39,
"2018-01", 78167.25,
"2018-02", 75991.44,
"2018-03", 42520.93,
"2018-04", 70489.92,
"2018-05", 121063.35,
"2018-06", 76308.47,
"2018-07", 118085.7,
"2018-08", 96153.38,
"2018-09", 82827.1,
"2018-10", 109288.83,
"2018-11", 145774.52,
"2018-12", 141572.77,
"2019-01", 123055.83,
"2019-02", 104232.24,
"2019-03", 435086.33,
"2019-04", 74304.96,
"2019-05", 117237.82,
"2019-06", 82013.47,
"2019-07", 99382.67,
"2019-08", 138455.2,
"2019-09", 97301.99,
"2019-10", 137206.09,
"2019-11", 109862.44,
"2019-12", 118150.96,
"2020-01", 140717.9,
"2020-02", 127622.3,
"2020-03", 134126.09
)
# Data preparation
df %>%
mutate(date = ymd(paste0(date, "-01")),
month_year = paste(month(date, label = TRUE), year(date), sep = "-")
) %>%
column_to_rownames("month_year") %>% # sets the column labels to row names
head()
# Preview of the data frame with row names (e.g., Jul-2017, Aug-2017, Sep-2017, etc.)
date sales_revenue_incl_credit
Jul-2017 2017-07-01 56037.46
Aug-2017 2017-08-01 38333.90
Sep-2017 2017-09-01 48716.92
Oct-2017 2017-10-01 65447.67
Nov-2017 2017-11-01 134752.57
Dec-2017 2017-12-01 116477.39
# Reproducing your plot
df %>%
ggplot(aes(x = date, y = sales_revenue_incl_credit)) +
geom_line() +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
panel.grid.minor = element_blank())
Upvotes: 1