Alwin
Alwin

Reputation: 117

ggplot boxplot with custom X-Axis and grouping and sorting on separate values

I'm trying to create a boxplot based on timeseries data for multiple years. I want to group observations from multiple years by a variable "DAP" (similar to day of year 0-365), order them by day from November to March but only display the Month on the X-Axis.

I can create a custom order and X-Axis by creating a factor with each month, that works

level_order <- c('November', 'December', 'January', 'February', 'March')
plot <- ggplot(data = df, aes(y = y, x = factor(Month,level = level_order), group=DAP)) +
geom_boxplot(fill="grey85", width = 2.0) +
scale_x_discrete(limits = level_order)
plot

![enter image description here

Now I'm stuck making the alignment on the X-Axis according to the days of the month. For example the first datapoint from November 26th needs to more right, closer to December.

Changing the X-Axis to "Date" creates monthly labels for each year and also removed the grouping.

plot <- ggplot(data = df, aes(y = y, x = Date, group=DAP)) +
  geom_boxplot(fill="grey85")
plot + scale_x_date(date_breaks = "1 month", date_labels = "%B")

enter image description here

Setting the X-Axis to "DAP" instead of date gives me the correct order and spacing , but I need to display month on the X-Axis. How can I combine this last graph with the X-Axis labeling of graph 1?

plot <- ggplot(data = df, aes(y = y, x = DAP, group=DAP)) +
  geom_boxplot(fill="grey85")
plot

enter image description here

and here a sample of the dataset

    DAP Date       Month    y
1    47 2010-11-26 November 0.6872708
21  116 2011-02-03 February 0.7643213
41   68 2011-12-17 December 0.7021531
61  137 2012-02-24 February 0.7178306
81   92 2013-01-10  January 0.7330749
101  44 2013-11-23 November 0.6610618
121 113 2014-01-31  January 0.7961012
141  68 2014-12-17 December 0.7510821
161 137 2015-02-24 February 0.7799938
181  92 2016-01-10  January 0.6861423
201  47 2016-11-26 November 0.7155526
221 116 2017-02-03 February 0.7397810
241  72 2017-12-21 December 0.7259670
261 144 2018-03-03    March 0.6725775
281 106 2019-01-24  January 0.7637322
301  65 2019-12-14 December 0.7184616
321 134 2020-02-21 February 0.6760159

Upvotes: 0

Views: 2561

Answers (2)

stefan
stefan

Reputation: 124213

Try this. To get the right order, spacing and labels I make a new date. As year seems to be not relevant I set the year for obs November and December to 2019, and for the other obs to 2020.

df <- structure(list(DAP = c(
  47L, 116L, 68L, 137L, 92L, 44L, 113L,
  68L, 137L, 92L, 47L, 116L, 72L, 144L, 106L, 65L, 134L
), Date = c(
  "2010-11-26",
  "2011-02-03", "2011-12-17", "2012-02-24", "2013-01-10", "2013-11-23",
  "2014-01-31", "2014-12-17", "2015-02-24", "2016-01-10", "2016-11-26",
  "2017-02-03", "2017-12-21", "2018-03-03", "2019-01-24", "2019-12-14",
  "2020-02-21"
), Month = c(
  "November", "February", "December",
  "February", "January", "November", "January", "December", "February",
  "January", "November", "February", "December", "March", "January",
  "December", "February"
), y = c(
  0.6872708, 0.7643213, 0.7021531,
  0.7178306, 0.7330749, 0.6610618, 0.7961012, 0.7510821, 0.7799938,
  0.6861423, 0.7155526, 0.739781, 0.725967, 0.6725775, 0.7637322,
  0.7184616, 0.6760159
)), row.names = c(NA, -17L), class = "data.frame")

library(ggplot2)

# Make a new Date to get the correct order as with DAP.
# Set year for obs November and Decemeber to 2019,
# for other Obs to 2020,
df$Date1 <- gsub("20\\d{2}-(1\\d{1})", "2019-\\1", df$Date)
df$Date1 <- gsub("20\\d{2}-(0\\d{1})", "2020-\\1", df$Date1)
df$Date1 <- as.Date(df$Date1)

# use new date gives correcr order, spacing and labels
# Also adjusted limits
plot <- ggplot(data = df, aes(y = y, x = Date1, group = DAP)) +
  geom_boxplot(fill = "grey85")
plot +
  scale_x_date(date_breaks = "1 month", date_labels = "%B", limits = c(as.Date("2019-11-01"), as.Date("2020-03-31")))

Upvotes: 0

MarBlo
MarBlo

Reputation: 4524

The following approach uses tidyverse. The date is separated into year-month-day and those newly created columns are made numeric. In the ggplot part position_dodge2(preserve = "single") is used which keeps the boxwidth the same. scale_x_discrete helps to redefine x-axis breaks and tick labels. width = 1 controls the distance between the boxes.

library(tidyverse)

df <- tibble::tribble(
  ~DAP, ~Date, ~Month, ~y,
  47, "2010-11-26", "November", 0.6872708,
  116, "2011-02-03", "February", 0.7643213,
  68, "2011-12-17", "December", 0.7021531,
  137, "2012-02-24", "February", 0.7178306,
  92, "2013-01-10", "January", 0.7330749,
  44, "2013-11-23", "November", 0.6610618,
  113, "2014-01-31", "January", 0.7961012,
  68, "2014-12-17", "December", 0.7510821,
  137, "2015-02-24", "February", 0.7799938,
  92, "2016-01-10", "January", 0.6861423,
  47, "2016-11-26", "November", 0.7155526,
  116, "2017-02-03", "February", 0.7397810,
  72, "2017-12-21", "December", 0.7259670,
  144, "2018-03-03", "March", 0.6725775,
  106, "2019-01-24", "January", 0.7637322,
  65, "2019-12-14", "December", 0.7184616,
  134, "2020-02-21", "February", 0.6760159
)
df$Date <- as.Date(df$Date)

df %>%
  separate(Date, sep = "-", into = c("year", "month", "day")) %>%
  mutate_at(vars("year":"day"), as.numeric) %>%
  select(-c(year, Month)) %>%
  ggplot(aes(
    x = factor(month, level = c(11, 12, 1, 2, 3)), y = y,
    group = DAP, color = factor(month)
  )) +
  geom_boxplot(width = 1, lwd = 0.2, position = position_dodge2(preserve = "single")) +
  scale_x_discrete(
    breaks = c(11, 12, 1, 2, 3),
    labels = c("November", "December", "January", "February", "March")
  ) +
  labs(x = "") +
  theme(legend.position = "none")

Upvotes: 1

Related Questions