Evan
Evan

Reputation: 1499

How to tabulate a total amount of points that fall within months?

I have a df1:

ID    Month     Score
716   5/2/2020   30
87    8/1/2020   34
190   5/30/2020  90
18    6/18/2020  100
19    7/2/2020   90
102   8/17/2020  89
60    7/7/2020   99
100   8/6/2020   109
77    9/5/2020   98
88    9/6/2020   99  
99    12/4/2020  55

I would like to make a new df2 that calculates the 'event's' within each month. And then I would like to graph these totals on a chart. Such as:

Df2

Month       Events
05/2020     2
06/2020     1
07/2020     2
08/2020     2
09/2020     2
10/2020     0
11/2020     0
12/2020     1

The graph would have months on the X-axis and events on the y-axis

Upvotes: 1

Views: 53

Answers (2)

akrun
akrun

Reputation: 887611

We convert the 'Month' to Date, and then to yearmon class (from zoo) to get the frequency count, then use complete (from tidyr) to expand the rows for those missing months by taking the sequence from min to max by 1 month interval, and reformat the 'Month'

library(dplyr)
library(lubridate)
library(zoo)
library(tidyr)
df1 %>%
   count(Month = as.Date(as.yearmon(mdy(Month)))) %>%
   complete(Month = seq(min(Month), max(Month), 
         by = '1 month'), fill = list(n = 0)) %>%
   mutate(Month = format(Month, '%m/%Y'))

-output

# A tibble: 8 x 2
#  Month       n
#  <chr>   <dbl>
#1 05/2020     2
#2 06/2020     1
#3 07/2020     2
#4 08/2020     3
#5 09/2020     2
#6 10/2020     0
#7 11/2020     0
#8 12/2020     1

If we need to plot, use ggplot

library(ggplot2)
df1 %>%
  count(Month = as.Date(as.yearmon(mdy(Month)))) %>%
  complete(Month = seq(min(Month), max(Month), 
      by = '1 month'), fill = list(n = 0)) %>%
  ggplot(aes(x = Month, y = n)) + 
     geom_col()+ 
     scale_x_date(date_breaks = '1 month', date_labels = '%m/%Y') + 
     theme_bw()

-output

enter image description here

data

df1 <- structure(list(ID = c(716L, 87L, 190L, 18L, 19L, 102L, 60L, 100L, 
77L, 88L, 99L), Month = c("5/2/2020", "8/1/2020", "5/30/2020", 
"6/18/2020", "7/2/2020", "8/17/2020", "7/7/2020", "8/6/2020", 
"9/5/2020", "9/6/2020", "12/4/2020"), Score = c(30L, 34L, 90L, 
100L, 90L, 89L, 99L, 109L, 98L, 99L, 55L)), class = "data.frame", 
row.names = c(NA, 
-11L))

Upvotes: 1

r2evans
r2evans

Reputation: 160667

aggregate(dat$Score,
          list(format(as.Date(dat$Month, format="%m/%d/%Y"), format="%m/%Y")), 
          FUN = length)
#   Group.1 x
# 1 05/2020 2
# 2 06/2020 1
# 3 07/2020 2
# 4 08/2020 3
# 5 09/2020 2
# 6 12/2020 1

Upvotes: 2

Related Questions