amathew
amathew

Reputation: 366

Find the count of events preceding a given date that varies

I'm using the data.table package in R. Let's say I have the following data.

dt = data.table(id = c(101,101,101,101,
                       102,102,102, 
                       103,103, 
                       104,104,104,104),
                date = c("2016-01-01","2016-02-01","2016-02-01","2016-03-01",
                         "2016-05-01","2016-06-01","2016-08-01",
                         "2016-01-01","2016-03-01",
                         "2016-01-01","2016-02-01","2016-02-01","2016-03-01"),
                event = c("inbound call","rescheduled","scheduled","completed",
                          "inbound call","rescheduled","scheduled",
                          "incomplete","scheduled",
                          "inbound call","rescheduled","scheduled","inbound call"))
dt

I would like to find a count of the total number of inbound calls for each member that occurred prior to the schedule date event.

If I just wanted the total number of inbound call for each id, I could do the following.

dt[event=="inbound call", .N, by = id]

However, this would be wrong for some member id's. For example, the count is 2 for id 104 because they've had 2 inbound calls. However, only 1 occurred prior to the scheduled date, so it should be 1

How they can I set up the logic where I get a count of the inbound calls that occurred prior to the scheduled event.

Desired output is this

id    num_inbound_calls
101   1
102   1
103   0
104   1

Upvotes: 1

Views: 111

Answers (3)

Shree
Shree

Reputation: 11140

Here's a way with dplyr -

dt %>% 
  group_by(id) %>% 
  summarise(
    calls = sum(event[1:which.max(event == "scheduled")] == "inbound call")
  )

# A tibble: 4 x 2
     id calls
  <dbl> <int>
1   101     1
2   102     1
3   103     0
4   104     1

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Another version with dplyr assuming you will have at least one "scheduled" for each id is to slice rows till 1st "scheduled" for each group and then count number of "inbound call".

library(dplyr)

dt %>%
  arrange(as.Date(date)) %>%
  group_by(id) %>%
  slice(seq_len(which.max(event == "scheduled"))) %>%
  summarise(n = sum(event == "inbound call"))

#    id     n
#  <dbl> <int>
#1   101     1
#2   102     1
#3   103     0
#4   104     1

A base R option using combination of ave and aggregate

aggregate(event~id, dt[with(dt, ave(event == "scheduled", id, 
  FUN = function(x) seq_along(x) <= which.max(x))), ],     
        function(x) sum(x == "inbound call"))

#  id  event
#1 101     1
#2 102     1
#3 103     0
#4 104     1

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

A possible approach with data.table:

dt[, num_inbound_calls := cumsum(event=="inbound call"), id][event=="scheduled"]

output:

    id       date     event cs
1: 101 2016-02-01 scheduled  1
2: 102 2016-08-01 scheduled  1
3: 103 2016-03-01 scheduled  0
4: 104 2016-02-01 scheduled  1

Upvotes: 3

Related Questions