Reputation: 117
I'm trying to create a boxplot based on timeseries data for multiple years. I want to group observations from multiple years by a variable "DAP" (similar to day of year 0-365), order them by day from November to March but only display the Month on the X-Axis.
I can create a custom order and X-Axis by creating a factor with each month, that works
level_order <- c('November', 'December', 'January', 'February', 'March')
plot <- ggplot(data = df, aes(y = y, x = factor(Month,level = level_order), group=DAP)) +
geom_boxplot(fill="grey85", width = 2.0) +
scale_x_discrete(limits = level_order)
plot
Now I'm stuck making the alignment on the X-Axis according to the days of the month. For example the first datapoint from November 26th needs to more right, closer to December.
Changing the X-Axis to "Date" creates monthly labels for each year and also removed the grouping.
plot <- ggplot(data = df, aes(y = y, x = Date, group=DAP)) +
geom_boxplot(fill="grey85")
plot + scale_x_date(date_breaks = "1 month", date_labels = "%B")
Setting the X-Axis to "DAP" instead of date gives me the correct order and spacing , but I need to display month on the X-Axis. How can I combine this last graph with the X-Axis labeling of graph 1?
plot <- ggplot(data = df, aes(y = y, x = DAP, group=DAP)) +
geom_boxplot(fill="grey85")
plot
and here a sample of the dataset
DAP Date Month y
1 47 2010-11-26 November 0.6872708
21 116 2011-02-03 February 0.7643213
41 68 2011-12-17 December 0.7021531
61 137 2012-02-24 February 0.7178306
81 92 2013-01-10 January 0.7330749
101 44 2013-11-23 November 0.6610618
121 113 2014-01-31 January 0.7961012
141 68 2014-12-17 December 0.7510821
161 137 2015-02-24 February 0.7799938
181 92 2016-01-10 January 0.6861423
201 47 2016-11-26 November 0.7155526
221 116 2017-02-03 February 0.7397810
241 72 2017-12-21 December 0.7259670
261 144 2018-03-03 March 0.6725775
281 106 2019-01-24 January 0.7637322
301 65 2019-12-14 December 0.7184616
321 134 2020-02-21 February 0.6760159
Upvotes: 0
Views: 2561
Reputation: 124213
Try this. To get the right order, spacing and labels I make a new date. As year seems to be not relevant I set the year for obs November and December to 2019, and for the other obs to 2020.
df <- structure(list(DAP = c(
47L, 116L, 68L, 137L, 92L, 44L, 113L,
68L, 137L, 92L, 47L, 116L, 72L, 144L, 106L, 65L, 134L
), Date = c(
"2010-11-26",
"2011-02-03", "2011-12-17", "2012-02-24", "2013-01-10", "2013-11-23",
"2014-01-31", "2014-12-17", "2015-02-24", "2016-01-10", "2016-11-26",
"2017-02-03", "2017-12-21", "2018-03-03", "2019-01-24", "2019-12-14",
"2020-02-21"
), Month = c(
"November", "February", "December",
"February", "January", "November", "January", "December", "February",
"January", "November", "February", "December", "March", "January",
"December", "February"
), y = c(
0.6872708, 0.7643213, 0.7021531,
0.7178306, 0.7330749, 0.6610618, 0.7961012, 0.7510821, 0.7799938,
0.6861423, 0.7155526, 0.739781, 0.725967, 0.6725775, 0.7637322,
0.7184616, 0.6760159
)), row.names = c(NA, -17L), class = "data.frame")
library(ggplot2)
# Make a new Date to get the correct order as with DAP.
# Set year for obs November and Decemeber to 2019,
# for other Obs to 2020,
df$Date1 <- gsub("20\\d{2}-(1\\d{1})", "2019-\\1", df$Date)
df$Date1 <- gsub("20\\d{2}-(0\\d{1})", "2020-\\1", df$Date1)
df$Date1 <- as.Date(df$Date1)
# use new date gives correcr order, spacing and labels
# Also adjusted limits
plot <- ggplot(data = df, aes(y = y, x = Date1, group = DAP)) +
geom_boxplot(fill = "grey85")
plot +
scale_x_date(date_breaks = "1 month", date_labels = "%B", limits = c(as.Date("2019-11-01"), as.Date("2020-03-31")))
Upvotes: 0
Reputation: 4524
The following approach uses tidyverse. The date is separated into year-month-day and those newly created columns are made numeric. In the ggplot part position_dodge2(preserve = "single")
is used which keeps the boxwidth the same. scale_x_discrete
helps to redefine x-axis breaks and tick labels. width = 1
controls the distance between the boxes.
library(tidyverse)
df <- tibble::tribble(
~DAP, ~Date, ~Month, ~y,
47, "2010-11-26", "November", 0.6872708,
116, "2011-02-03", "February", 0.7643213,
68, "2011-12-17", "December", 0.7021531,
137, "2012-02-24", "February", 0.7178306,
92, "2013-01-10", "January", 0.7330749,
44, "2013-11-23", "November", 0.6610618,
113, "2014-01-31", "January", 0.7961012,
68, "2014-12-17", "December", 0.7510821,
137, "2015-02-24", "February", 0.7799938,
92, "2016-01-10", "January", 0.6861423,
47, "2016-11-26", "November", 0.7155526,
116, "2017-02-03", "February", 0.7397810,
72, "2017-12-21", "December", 0.7259670,
144, "2018-03-03", "March", 0.6725775,
106, "2019-01-24", "January", 0.7637322,
65, "2019-12-14", "December", 0.7184616,
134, "2020-02-21", "February", 0.6760159
)
df$Date <- as.Date(df$Date)
df %>%
separate(Date, sep = "-", into = c("year", "month", "day")) %>%
mutate_at(vars("year":"day"), as.numeric) %>%
select(-c(year, Month)) %>%
ggplot(aes(
x = factor(month, level = c(11, 12, 1, 2, 3)), y = y,
group = DAP, color = factor(month)
)) +
geom_boxplot(width = 1, lwd = 0.2, position = position_dodge2(preserve = "single")) +
scale_x_discrete(
breaks = c(11, 12, 1, 2, 3),
labels = c("November", "December", "January", "February", "March")
) +
labs(x = "") +
theme(legend.position = "none")
Upvotes: 1