Reputation: 366
I'm using the data.table package in R. Let's say I have the following data.
dt = data.table(id = c(101,101,101,101,
102,102,102,
103,103,
104,104,104,104),
date = c("2016-01-01","2016-02-01","2016-02-01","2016-03-01",
"2016-05-01","2016-06-01","2016-08-01",
"2016-01-01","2016-03-01",
"2016-01-01","2016-02-01","2016-02-01","2016-03-01"),
event = c("inbound call","rescheduled","scheduled","completed",
"inbound call","rescheduled","scheduled",
"incomplete","scheduled",
"inbound call","rescheduled","scheduled","inbound call"))
dt
I would like to find a count of the total number of inbound calls for each member that occurred prior to the schedule date event.
If I just wanted the total number of inbound call for each id, I could do the following.
dt[event=="inbound call", .N, by = id]
However, this would be wrong for some member id's. For example, the count is 2 for id 104 because they've had 2 inbound calls. However, only 1 occurred prior to the scheduled date, so it should be 1
How they can I set up the logic where I get a count of the inbound calls that occurred prior to the scheduled event.
Desired output is this
id num_inbound_calls
101 1
102 1
103 0
104 1
Upvotes: 1
Views: 111
Reputation: 11140
Here's a way with dplyr
-
dt %>%
group_by(id) %>%
summarise(
calls = sum(event[1:which.max(event == "scheduled")] == "inbound call")
)
# A tibble: 4 x 2
id calls
<dbl> <int>
1 101 1
2 102 1
3 103 0
4 104 1
Upvotes: 1
Reputation: 388817
Another version with dplyr
assuming you will have at least one "scheduled"
for each id
is to slice
rows till 1st "scheduled"
for each group and then count number of "inbound call"
.
library(dplyr)
dt %>%
arrange(as.Date(date)) %>%
group_by(id) %>%
slice(seq_len(which.max(event == "scheduled"))) %>%
summarise(n = sum(event == "inbound call"))
# id n
# <dbl> <int>
#1 101 1
#2 102 1
#3 103 0
#4 104 1
A base R option using combination of ave
and aggregate
aggregate(event~id, dt[with(dt, ave(event == "scheduled", id,
FUN = function(x) seq_along(x) <= which.max(x))), ],
function(x) sum(x == "inbound call"))
# id event
#1 101 1
#2 102 1
#3 103 0
#4 104 1
Upvotes: 1
Reputation: 25225
A possible approach with data.table
:
dt[, num_inbound_calls := cumsum(event=="inbound call"), id][event=="scheduled"]
output:
id date event cs
1: 101 2016-02-01 scheduled 1
2: 102 2016-08-01 scheduled 1
3: 103 2016-03-01 scheduled 0
4: 104 2016-02-01 scheduled 1
Upvotes: 3