hunter
hunter

Reputation: 160

Using R, how can I count objects according to multiple conditions?

I am trying to count objects in data frame of 911 calls according to certain conditions and I am having trouble with the logic. My actual data has over 3 million rows, so I've tried to simplify my problem by considering this small subset:

dat <- structure(list(call = c("14-1234", "14-4523", "14-7711", "14-8199", "14-3124"), 
                      badge = c("8456", "1098", "3432", "4750", "5122"),
                      off.sex = c("Male", "Male", "Female", "Male", "Male"),
                      shift = c("1", "1", "1", "1", "2"),
                      assignedmin = c(1902, 1870, 1950, 1899, 1907),
                      clearedmin = c(1980, 1910, 1990, 1912, 1956)),
                 class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L))

The variable "call" identifies 911 calls, "badge" identifies officers, "shift" basically identifies a stretch of time in a particular area. The specific minute a call comes in is given by "assignedmin" and the call is considered cleared at the time given by "clearedmin."

I want to count how many officers on a given shift are able to respond to a particular call. For example, for call 14-1234, officer 8456 is assigned at time 1902. How many other officers would have been able to respond to that call? Officer 1098 was preoccupied with a different call from minute 1870 to minute 1910, and so would not have been able to respond to the call occurring at minute 1902. However, based on this simple data set officer 3432 would not have been busy at that time and so would be considered available. Officer 5122 was unoccupied at that time, but was on a different shift and so would not be considered available.

Desired output:

  call    badge off.sex shift assignedmin clearedmin n_shift n_avail n_unavail n_shift_male n_male_avail
1 14-1234 8456  Male    1            1902       1980       4       2         2            3            1
2 14-4523 1098  Male    1            1870       1910       4       4         0            3            3
3 14-7711 3432  Female  1            1950       1990       4       3         1            3            2
4 14-8199 4750  Male    1            1899       1912       4       3         1            3            2
5 14-3124 5122  Male    2            1907       1956       1       1         1            1            1

I hope this is not too convoluted. Basically, at the time given by assignedmin, an officer is available if he or she is on the same shift and not occupied with another call. I can easily count the number of officers on a shift using dplyr and data.table like so:

dat <- dat %>% group_by(shift) %>% mutate(n_shift = uniqueN(badge),
                                          n_shift_male = uniqueN(badge[off.sex == 'Male']) %>% ungroup()

Upvotes: 0

Views: 69

Answers (2)

chinsoon12
chinsoon12

Reputation: 25223

An option using data.table to count number of officers per shift, then perform a non-equi self join to find out n_unavail and finally, n_avail = n_shift - n_unavail:

library(data.table)
setDT(dat)[, c("n_shift", "n_shift_male") := .(.N, sum(off.sex=="Male")), shift]

dat[, c("n_unavail", "n_male_not_avail") :=
        dat[dat, on=.(shift, assignedmin<=assignedmin, clearedmin>=assignedmin),
            by=.EACHI, .(.N - 1L, sum(x.off.sex[x.call != i.call]=="Male"))][,
                (1L:3L) := NULL]
    ]

dat[, c("n_avail", "n_male_avail") := .(n_shift - n_unavail, n_shift_male - n_male_not_avail)]

output:

      call badge off.sex shift assignedmin clearedmin n_shift n_shift_male n_unavail n_male_not_avail n_avail n_male_avail
1: 14-1234  8456    Male     1        1902       1980       4            3         2                2       2            1
2: 14-4523  1098    Male     1        1870       1910       4            3         0                0       4            3
3: 14-7711  3432  Female     1        1950       1990       4            3         1                1       3            2
4: 14-8199  4750    Male     1        1899       1912       4            3         1                1       3            2
5: 14-3124  5122    Male     2        1907       1956       1            1         0                0       1            1

Upvotes: 1

Bas
Bas

Reputation: 4658

The n_unavail column can be filled as below. First, I join the table by itself on shift, so that there is a row for every officer combination in the same shift (this can be infeasible if your dataset is large). Then, I calculate whether the _other officer is unavailable at the time of the call, and count them.

dat %>% 
  left_join(dat, by = "shift", suffix = c("", "_other")) %>% 
  mutate(unavail = (assignedmin_other < assignedmin & clearedmin_other > assignedmin)) %>% 
  group_by(call) %>% 
  summarise(n_avail = sum(!unavail),
            n_unavail = sum(unavail))

#   call    n_avail n_unavail
#   <chr>     <int>     <int>
# 1 14-1234       2         2
# 2 14-3124       1         0
# 3 14-4523       4         0
# 4 14-7711       3         1
# 5 14-8199       3         1

This can be joined to your table to get your desired result.

Upvotes: 0

Related Questions