JPB
JPB

Reputation: 11

Plotting average monthly counts per decade on a plot

I have a data set that has monthly "flows" over 68 years. I am trying to make a comparison of flow distributions by decade by making a plot that has a seasonal distribution on the x-axis and displays a mean value for each decade on the plot.

Upvotes: 0

Views: 467

Answers (2)

Jon Spring
Jon Spring

Reputation: 66445

(Edit: changed from line graph to dodged bar plot, to better align with OP code.)

Here's an approach using dplyr, tidyr, and ggplot2 from tidyverse.

library(tidyverse)
M %>%
  group_by(Decade = floor(Year/10)*10) %>%
  summarize_at(vars(Jan:Sep), mean) %>%

  # This uses tidyr::pivot_longer to reshape the data longer, which gives us the
  #  ability to map decade to color.
  pivot_longer(-Decade, names_to = "Month", values_to = "Avg") %>%

  # This step to get the months to be an ordered factor in order of appearance, 
  #   which is necessary to avoid the months showing up in alphabetical order.
  mutate(Month = fct_inorder(Month)) %>%
  # Alternatively, we could have aligned these thusly
  # mutate(Month_order = match(Month, month.abb)) %>%
  # mutate(Month = fct_reorder(Month, Month_order)) %>%

  ggplot(aes(Month, Avg, fill = as.factor(Decade))) +
  geom_col(position = position_dodge()) +
  scale_fill_discrete(name = "Decade")

enter image description here

Upvotes: 1

jmcastagnetto
jmcastagnetto

Reputation: 486

Using your sample data, and the tidyverse packages, the following code will calculate the average per decade and month:

library(tidyverse)

x <- "Year    Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep
1948    29550   47330   64940   61140   20320   17540   37850   29250   17100   
1949    45700   53200   37870   36310   39200   23040   31170   23640   19720   
1950    16050   17950   27040   21610   15510   16090   12010   11360   14390   
1951    14280   13210   16260   24280   13570   9547    9921    8129    7304    
1952    19030   29250   58860   31780   19940   16930   9268    9862    9708    
1953    24340   28020   31830   29700   44980   15630   22660   14190   13430   
1954    34660   23260   24390   21500   13250   10860   10700   8188    6092    
1955    14050   19430   12780   19330   12210   7892    12450   10920   6850    
1956    7262    20800   27680   24110   13560   8594    10150   7721    10540   
1957    14470   13350   22720   39860   23980   12630   10230   7008    8567"

d <- read_table(x) %>% 
  mutate(
    decade = (Year %/% 10)*10 # add column for decade
  ) %>% 
  select(-Year) %>%  # remove the year
  pivot_longer(  # convert to a 'tidy' (long) format
    cols = Jan:Sep,
    names_to = "month",
    values_to = "count"
  ) %>% 
  mutate(
    month = factor(month, levels = month.abb, ordered = TRUE)  # make sure months are ordered
  ) %>% 
  group_by(decade, month) %>% 
  summarise(
    mean = mean(count)
  )

If you print that dataframe, you get:

> d
# A tibble: 18 x 3
# Groups:   decade [2]
   decade month   mean
    <dbl> <ord>  <dbl>
 1   1940 Jan   37625 
 2   1940 Feb   50265 
 3   1940 Mar   51405 
 4   1940 Apr   48725 
 5   1940 May   29760 
 6   1940 Jun   20290 
 7   1940 Jul   34510 
 8   1940 Aug   26445 
 9   1940 Sep   18410 
10   1950 Jan   18018.
11   1950 Feb   20659.
12   1950 Mar   27695 
13   1950 Apr   26521.
14   1950 May   19625 
15   1950 Jun   12272.
16   1950 Jul   12174.
17   1950 Aug    9672.
18   1950 Sep    9610.

If you need it back in wide format:

d2 <- d %>% 
  pivot_wider(
    id_cols = decade,
    names_from = month,
    values_from = mean
  )
> d2
# A tibble: 2 x 10
# Groups:   decade [2]
  decade    Jan    Feb   Mar    Apr   May    Jun    Jul    Aug    Sep
   <dbl>  <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1   1940 37625  50265  51405 48725  29760 20290  34510  26445  18410 
2   1950 18018. 20659. 27695 26521. 19625 12272. 12174.  9672.  9610.

Upvotes: 1

Related Questions