Reputation: 160
I am trying to count objects in data frame of 911 calls according to certain conditions and I am having trouble with the logic. My actual data has over 3 million rows, so I've tried to simplify my problem by considering this small subset:
dat <- structure(list(call = c("14-1234", "14-4523", "14-7711", "14-8199", "14-3124"),
badge = c("8456", "1098", "3432", "4750", "5122"),
off.sex = c("Male", "Male", "Female", "Male", "Male"),
shift = c("1", "1", "1", "1", "2"),
assignedmin = c(1902, 1870, 1950, 1899, 1907),
clearedmin = c(1980, 1910, 1990, 1912, 1956)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))
The variable "call" identifies 911 calls, "badge" identifies officers, "shift" basically identifies a stretch of time in a particular area. The specific minute a call comes in is given by "assignedmin" and the call is considered cleared at the time given by "clearedmin."
I want to count how many officers on a given shift are able to respond to a particular call. For example, for call 14-1234, officer 8456 is assigned at time 1902. How many other officers would have been able to respond to that call? Officer 1098 was preoccupied with a different call from minute 1870 to minute 1910, and so would not have been able to respond to the call occurring at minute 1902. However, based on this simple data set officer 3432 would not have been busy at that time and so would be considered available. Officer 5122 was unoccupied at that time, but was on a different shift and so would not be considered available.
Desired output:
call badge off.sex shift assignedmin clearedmin n_shift n_avail n_unavail n_shift_male n_male_avail
1 14-1234 8456 Male 1 1902 1980 4 2 2 3 1
2 14-4523 1098 Male 1 1870 1910 4 4 0 3 3
3 14-7711 3432 Female 1 1950 1990 4 3 1 3 2
4 14-8199 4750 Male 1 1899 1912 4 3 1 3 2
5 14-3124 5122 Male 2 1907 1956 1 1 1 1 1
I hope this is not too convoluted. Basically, at the time given by assignedmin, an officer is available if he or she is on the same shift and not occupied with another call. I can easily count the number of officers on a shift using dplyr and data.table like so:
dat <- dat %>% group_by(shift) %>% mutate(n_shift = uniqueN(badge),
n_shift_male = uniqueN(badge[off.sex == 'Male']) %>% ungroup()
Upvotes: 0
Views: 69
Reputation: 25223
An option using data.table
to count number of officers per shift, then perform a non-equi self join to find out n_unavail
and finally, n_avail = n_shift - n_unavail
:
library(data.table)
setDT(dat)[, c("n_shift", "n_shift_male") := .(.N, sum(off.sex=="Male")), shift]
dat[, c("n_unavail", "n_male_not_avail") :=
dat[dat, on=.(shift, assignedmin<=assignedmin, clearedmin>=assignedmin),
by=.EACHI, .(.N - 1L, sum(x.off.sex[x.call != i.call]=="Male"))][,
(1L:3L) := NULL]
]
dat[, c("n_avail", "n_male_avail") := .(n_shift - n_unavail, n_shift_male - n_male_not_avail)]
output:
call badge off.sex shift assignedmin clearedmin n_shift n_shift_male n_unavail n_male_not_avail n_avail n_male_avail
1: 14-1234 8456 Male 1 1902 1980 4 3 2 2 2 1
2: 14-4523 1098 Male 1 1870 1910 4 3 0 0 4 3
3: 14-7711 3432 Female 1 1950 1990 4 3 1 1 3 2
4: 14-8199 4750 Male 1 1899 1912 4 3 1 1 3 2
5: 14-3124 5122 Male 2 1907 1956 1 1 0 0 1 1
Upvotes: 1
Reputation: 4658
The n_unavail
column can be filled as below. First, I join the table by itself on shift
, so that there is a row for every officer combination in the same shift (this can be infeasible if your dataset is large). Then, I calculate whether the _other
officer is unavailable at the time of the call, and count them.
dat %>%
left_join(dat, by = "shift", suffix = c("", "_other")) %>%
mutate(unavail = (assignedmin_other < assignedmin & clearedmin_other > assignedmin)) %>%
group_by(call) %>%
summarise(n_avail = sum(!unavail),
n_unavail = sum(unavail))
# call n_avail n_unavail
# <chr> <int> <int>
# 1 14-1234 2 2
# 2 14-3124 1 0
# 3 14-4523 4 0
# 4 14-7711 3 1
# 5 14-8199 3 1
This can be joined to your table to get your desired result.
Upvotes: 0