Data Analyst
Data Analyst

Reputation: 61

How to create a data frame aggregating the daily ticket numbers to monthly count?

I am having the list of ticket numbers for each date. Date column is transformed to data but the ticket number column is a text.

Created       Ticket
01-Jan-19   a1
02-Jan-19   a2
03-Jan-19   a3
04-Jan-19   a4
05-Jan-19   a5
06-Jan-19   a6
07-Jan-19   a7
08-Jan-19   a8
09-Jan-19   a9
10-Jan-19   a10
11-Jan-19   a11
12-Jan-19   a12
13-Jan-19   a13
14-Jan-19   a14
15-Jan-19   a15
16-Jan-19   a16
17-Jan-19   a17
18-Jan-19   a18
19-Jan-19   a19
01-Feb-19   a20
02-Feb-19   a21
03-Feb-19   a22
04-Feb-19   a23

Tried to use floor data in R but since the ticket number column is character I am not able to use it.

data <- read.csv(file = 'D:\\DS Data\\SampleTickets.csv', stringsAsFactors = FALSE,header = TRUE)

str(data)
library(readr)
library(lubridate)
library(dplyr)

data <- data %>%
  mutate(Created = dmy(Created))

data %>% group_by(month=floor_date(Created, "month")) %>%
  summarize(amount=sum(Ticket))

I am expecting a data frame output like..

CreatedMonth     CountOfTickets
1/1/2019             18
1/2/2019              4

Upvotes: 2

Views: 45

Answers (2)

Uwe
Uwe

Reputation: 42544

You were nearly there: Just use n() instead of sum(Ticket) to count the number of rows:

library(dplyr)
library(lubridate)
data %>%
  mutate(Created = dmy(Created)) %>%
  group_by(month = floor_date(Created, "month")) %>%
  summarize(amount = n())
# A tibble: 2 x 2
  month      amount
  <date>      <int>
1 2019-01-01     19
2 2019-02-01      4

However, there is a shortcut which uses count():

data %>% 
  count(CreatedMonth = dmy(Created) %>% floor_date("month"))
# A tibble: 2 x 2
  CreatedMonth     n
  <date>       <int>
1 2019-01-01      19
2 2019-02-01       4

For the sake of completeness, here is also a data.table version:

library(lubridate)
library(data.table)
setDT(data)[, .N, by = .(CreatedMonth = floor_date(dmy(Created), "month"))]
   CreatedMonth  N
1:   2019-01-01 19
2:   2019-02-01  4

Data

data <- readr::read_table("Created       Ticket
01-Jan-19   a1
02-Jan-19   a2
03-Jan-19   a3
04-Jan-19   a4
05-Jan-19   a5
06-Jan-19   a6
07-Jan-19   a7
08-Jan-19   a8
09-Jan-19   a9
10-Jan-19   a10
11-Jan-19   a11
12-Jan-19   a12
13-Jan-19   a13
14-Jan-19   a14
15-Jan-19   a15
16-Jan-19   a16
17-Jan-19   a17
18-Jan-19   a18
19-Jan-19   a19
01-Feb-19   a20
02-Feb-19   a21
03-Feb-19   a22
04-Feb-19   a23")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Using dplyr we can first convert Created column to actual Date and group them by each month and count number of tickets for each group.

library(dplyr)
df %>%
  mutate(Created = as.Date(Created, "%d-%b-%y")) %>%
  arrange(Created) %>%
  mutate(Yearmon = format(Created, "%B-%Y"), 
         Yearmon = factor(Yearmon, levels = unique(Yearmon))) %>%
  group_by(Yearmon) %>%
  summarise(count = n())


# Yearmon       count
#  <fct>         <int>
#1 January-2019     19
#2 February-2019     4

Upvotes: 0

Related Questions