Reputation: 1
I have a daily user records. Each day a user can have multiple records. I'd like to count the unique users in a rolling 3-day window. How can I achieve that?
set.seed(123)
dat<-data.table(day=rep(1:5,sample(6,5)))
dat$id<-sample(10,dat[,.N],replace=T)
> dat
day id
1: 1 1
2: 1 6
3: 2 9
4: 2 6
5: 2 5
6: 2 10
7: 3 5
8: 3 7
9: 3 6
10: 3 2
11: 3 9
12: 3 3
13: 4 1
14: 4 4
15: 4 10
16: 5 9
17: 5 7
18: 5 7
19: 5 10
20: 5 7
I want the result as following,i.e, for each day x, I want to count number of unique IDs in day x, x-1, and x-2.
sqldf('select a.day,count(distinct b.id) as user_cnt
from dat as a left join dat as b on a.day<=b.day+2 and a.day>=b.day group by a.day')
day user_cnt
1 2
2 5
3 8
4 9
5 9
Upvotes: 0
Views: 113
Reputation: 19394
From reading your question several times and inspecting your desired output, it looks like you want an adaptive rolling "unique" function, aligned to the right (i.e. the previous n days), with a window of 3 days.
Using datatable should be possible, and akrun has provided a non-adaptive solution with align='left'. However, you want align='right' (the default).
library(data.table)
dt[, .(.(id)), day][
, frollapply(seq_len(.N), n = 3, FUN = function(i) uniqueN(unlist(V1[i])))
[1] NA NA 8 9 9
Note that
dt[, .(.(id)), day]
day V1
1: 1 1,6
2: 2 9, 6, 5,10
3: 3 5,7,6,2,9,3
4: 4 1, 4,10
5: 5 9, 7, 7,10, 7
Unfortunately, data.table doesn't provide a partial (adaptive=TRUE) window option for the frollapply
function, unlike the others (frollmean
and frollsum
).
We can try...
nk <- function(x, k) c(seq.int(k), rep(k, x - k))
dt[, .(.(id)), day][
, frollapply(seq_len(.N), n = nk(.N, 3), FUN = function(i) uniqueN(unlist(V1[i])))
V1 V2 V3 V4 V5
1: 2 NA NA NA NA
2: 4 5 NA NA NA
3: 6 7 8 8 8
4: 3 9 9 9 9
5: 3 5 9 9 9
But we get a data.table with 5 columns with the answers lurking in the diagonal.
So I ended up using mapply
with a user-defined function N_unique
to count the unique values within the list of ids returned by the rolling window. We can still use partial window widths with the nk
function defined above.
N_unique <- function(i, width, x){
uniqueN(unlist(x[(i - (width - 1)):i]))
}
dt2 <- dt[, .(.(id)), day][
, user_cnt := mapply(FUN = N_unique, i = seq_len(.N),
width = nk(.N, 3), MoreArgs = list(x = V1))][, V1:=NULL]
dt2
day user_cnt
1: 1 2
2: 2 5
3: 3 8
4: 4 9
5: 5 9
Data:
dput(dt)
structure(list(day = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), id = c(1, 6, 9, 6, 5,
10, 5, 7, 6, 2, 9, 3, 1, 4, 10, 9, 7, 7, 10, 7)), row.names = c(NA,
-20L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0bae2498>)
Note: The command to create dt
using set.seed
resulted in something different to what the OP provided.
Upvotes: 2