Reputation: 343
I am trying to make a bar plot in ggplot of time series data that only includes data for January, February, and March for each year (40+yrs). I want one bar per month for all the years in the dataset with the bars next to each other (no spaces). To potentially complicate things, some months are missing (i.e. some years might only have Jan and March data..). The bars seem to plot correctly but there are spaces between them. I have tried countless of things but I can't figure out how to remove "empty"/"missing" months (these "missing" months are not part of my data frame). I have not been able to find this question on SO.
Example df:
structure(list(season = c("winter", "winter", "winter", "winter",
"winter", "winter", "winter", "winter", "winter", "winter", "winter",
"winter", "winter", "winter", "winter", "winter", "winter", "winter",
"winter", "winter", "winter", "winter", "winter", "winter", "winter"
), date = c("19-Mar-90", "20-Feb-91", "12-Feb-12", "07-Jan-94",
"28-Mar-85", "26-Feb-92", "11-Feb-92", "24-Mar-10", "30-Mar-13",
"18-Feb-83", "11-Mar-94", "02-Feb-10", "29-Jan-88", "04-Feb-83",
"10-Jan-89", "06-Jan-93", "13-Feb-77", "02-Feb-11", "03-Mar-79",
"27-Mar-81", "28-Jan-85", "13-Mar-08", "15-Feb-17", "14-Jan-90",
"31-Mar-82"), day = c(19L, 20L, 12L, 7L, 28L, 26L, 11L, 24L,
30L, 18L, 11L, 2L, 29L, 4L, 10L, 6L, 13L, 2L, 3L, 27L, 28L, 13L,
15L, 14L, 31L), month = c("Mar", "Feb", "Feb", "Jan", "Mar",
"Feb", "Feb", "Mar", "Mar", "Feb", "Mar", "Feb", "Jan", "Feb",
"Jan", "Jan", "Feb", "Feb", "Mar", "Mar", "Jan", "Mar", "Feb",
"Jan", "Mar"), monthNum = c(3L, 2L, 2L, 1L, 3L, 2L, 2L, 3L, 3L,
2L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 3L, 2L, 1L, 3L
), year = c(1990L, 1991L, 2012L, 1994L, 1985L, 1992L, 1992L,
2010L, 2013L, 1983L, 1994L, 2010L, 1988L, 1983L, 1989L, 1993L,
1977L, 2011L, 1979L, 1981L, 1985L, 2008L, 2017L, 1990L, 1982L
), Anomaly_yr = c(-0.0735902041331981, -0.59089540108907, 0,
0.614707415070896, -0.707106781186548, -0.5, -0.5, -0.707106781186547,
-0.707106781186547, 0.694734791305552, -0.892574080787689, 0.707106781186548,
-0.707106781186547, -0.537781361281464, 0.225223907763323, 1.15470053837925,
-0.707106781186548, -0.574952563436588, 0.357882806098879, -0.707106781186547,
0.707106781186548, 1.15470053837925, 0.707106781186548, 0.810170077754531,
-0.707106781186547)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -25L), groups = structure(list(
season = c("winter", "winter", "winter", "winter", "winter",
"winter", "winter", "winter", "winter", "winter", "winter",
"winter", "winter", "winter", "winter", "winter", "winter",
"winter", "winter"), year = c(1977L, 1979L, 1981L, 1982L,
1983L, 1985L, 1988L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L,
2008L, 2010L, 2011L, 2012L, 2013L, 2017L), .rows = structure(list(
17L, 19L, 20L, 25L, c(10L, 14L), c(5L, 21L), 13L, 15L,
c(1L, 24L), 2L, 6:7, 16L, c(4L, 11L), 22L, c(8L, 12L),
18L, 3L, 9L, 23L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -19L), .drop = TRUE))
format date
example$date <- as.Date(example$date, format = "%d-%b-%y")
including this bc if not too many bars end up with pixel issue and some bars not plotting etc...
example <- example %>%
mutate(col = factor(ifelse(Anomaly_yr > 0, 2, 1)))
Tried:
ggplot(data = example,
mapping = aes(x = date, y = Anomaly_yr,
fill = col,
color = col
)) +
geom_bar(stat = "identity",
position = "dodge",
show.legend=FALSE)
but it plots with spaces between bars (Im assuming because its holding place for April-Dec. Tried different position arguments to check whether that would make a difference
then tried the following after looking at various SO posts:
ggplot(data = example, aes(x = date, y = Anomaly_yr, fill = col, color = col)) + #need to add #fill argument due to pixel issue
geom_bar(stat = "identity",
position = "dodge",
show.legend=FALSE) +
scale_x_date(date_labels = "%Y - %b",
date_breaks = "1 month")
and tried many variations of scale_x_date, including:
#scale_x_date(date_labels = "%d - %b", date_breaks = "1 day",
# expand = c(0,0))
and
#scale_x_date(labels = date_format("%y"), breaks = date_breaks("year"))
and
#scale_x_date(date_breaks = "1 month", date_labels = "%b %Y")
Then I tried...
example$YearMonth <- as.Date(paste(example$year, example$monthNum, "1", sep = "-"))
example$YearMonth <- as.Date(paste0(format(example$YearMonth, "%Y-%m"), "-01"))
example$YearMonth_Label <- format(example$YearMonth, "%Y %b")
# Create the order of the months
month_order <- c("Jan", "Feb", "Mar")
example$YearMonth_Label <- factor(example$YearMonth_Label, levels = unique(example$YearMonth_Label, "%Y %b"))
ggplot(data = example, aes(x = YearMonth_Label, y = Anomaly_yr, fill = col)) + #need to add fill argument due to pixel issue
geom_bar(stat = "identity",
position = "dodge",
show.legend=FALSE) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 8)) +
theme(axis.text.x = element_text(angle = 90))
which almost seem to work but then x-axis is not in chronological order
I would like x-axis labels to be the year. No need to have month labels. Is there a way to plot this time series with 2-3 months per year but without spaces between bars? Am I missing something obvious or is there some bigger underlying issue? I could not find any duplicate posts concerning repeating missing months over long time periods.. Thank you!
Edit: maybe I should add that this example df probably has missing years, but my data does not
As suggested by @scott.pilgrim.vs.r in comments ; doing as.factor(year)..
ggplot(data = example, aes(x = as.factor(year), y = Anomaly_yr, fill = col)) + #need to add fill argument due to pixel issue
geom_bar(stat = "identity",
position = "dodge",
show.legend=FALSE)
Upvotes: 1
Views: 256
Reputation: 66880
Here in chronological order but not mapped linearly to date:
example |>
mutate(date = as.Date(date, format = "%d-%b-%y"),
date_label = factor(date, labels = format(date, "%b\n%Y"))) |>
ggplot(mapping = aes(x = date_label, y = Anomaly_yr,
fill = Anomaly_yr > 0)) +
geom_col(show.legend=FALSE)
Upvotes: 1
Reputation: 507
Okay, here is my suggestion:
First I filtered out data so there is only January- Mar.
df_clean <- df %>%
filter(month %in% c('Jan', 'Feb', 'Mar'))
Then, I saw some of the years and months had multiple Anomaly_yr
values. So I averaged them.
a <- aggregate(Anomaly_yr ~ month + year, data=df_clean, FUN= function (x) c(mean=mean(x)))
Now for the graph. I used as.factor(year)
to ensure even spaces in the year.
I also used position = position_dodge2(width = 0.9, preserve = "single")
to make all bars even width. Oh! You can fix the yaxis name by g + labs(y='year')
.
ggplot(data = a, aes(x = as.factor(year), y = Anomaly_yr, fill = as.factor(month))) + #need to add fill argument due to pixel issue
geom_bar(stat = "identity",
position = position_dodge2(width = 0.9, preserve = "single"),
show.legend=FALSE) +
labs(y='year') +
scale_x_discrete(labels = function(x) str_wrap(x, width = 8)) +
theme(axis.text.x = element_text(angle = 90))
Upvotes: 0