Marcus Paget
Marcus Paget

Reputation: 31

R Programming count occurrences in a sliding window

With a data.frame containing a user-id and time-stamp, is there a quick way to extract user-ids that reach a certain count with-in a sliding time scale.

For example if I want to find all users that appear 10 times within 30 secs.

My thoughts are to first subset only users that reach the target count (10) in the whole data.set. Then subtract first time-stamp from last for each user - if less than the time scale(30 secs), add to the target list.

If not then to test first with second, first with third and keep going until reaching time limit (30 secs) or the target count (10 times). Upon reaching time limit I would need to test second with current element.

Perhaps there is a library to help spot these, or some form of standard deviation, perhaps even clustering - to help spot and extract a smaller subset?

Upvotes: 0

Views: 542

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269694

Assuming one point per second we generate an input vector of 100 id's s. Then we rollapply across it outputting the id's which occcur more than 10 times:

library(zoo)
set.seed(123)
s <- sample(c("a", "b", "c"), 100, replace = TRUE) # test data

f <- function(x) toString(names(which(table(x) > 10)))
rollapply(s, 30, f)

 [1] "c"    "c"    "c"    "c"    "c"    "c"    "c"    "c"    "c"    "c"   
[11] "c"    "a, c" "a, c" "a, c" "a, c" "a, c" "a"    "a"    "a"    "a"   
[21] "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"   
[31] "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"    "a"   
[41] "a, b" "b"    "b"    ""     "a"    ""     ""     "b"    "b"    "b"   
[51] "b"    "b"    "b"    "b"    "b"    "b"    "b"    "b, c" "b, c" "b, c"
[61] "b, c" "c"    "b, c" "b, c" "b, c" "b"    "b"    "b"    "b"    "b"   
[71] "b"   

The first point above corresponds to times 1-30, the next to times 2-31, etc.

Next time please provide test data and show the answer expected.

Upvotes: 1

Related Questions