Reputation: 1074
I have a dataset which consists of three columns: user, action and time which is a log for user actions. the data looks like this:
user action time
1: 618663 34 1407160424
2: 617608 33 1407160425
3: 89514 34 1407160425
4: 71160 33 1407160425
5: 443464 32 1407160426
---
996: 146038 8 1407161349
997: 528997 9 1407161350
998: 804302 8 1407161351
999: 308922 8 1407161351
1000: 803763 8 1407161352
I want to separate sessions for each user based on action times. Actions done in certain period (for example one hour) are going to be assumed one session. The simple solution is to use a for loop and compare action times for each user but that's not efficient and my data is very large. Is there any method that can I use to overcome this problem? I can group users but separate on users actions into different sessions is somehow difficult for me :-)
Upvotes: 1
Views: 66
Reputation: 54237
Try
library(data.table)
dt <- rbind(
data.table(user=1, action=1:10, time=c(1,5,10,11,15,20,22:25)),
data.table(user=2, action=1:5, time=c(1,3,10,11,12))
)
# dt[, session:=cumsum(c(T, !(diff(time)<=2))), by=user][]
# user action time session
# 1: 1 1 1 1
# 2: 1 2 5 2
# 3: 1 3 10 3
# 4: 1 4 11 3
# 5: 1 5 15 4
# 6: 1 6 20 5
# 7: 1 7 22 5
# 8: 1 8 23 5
# 9: 1 9 24 5
# 10: 1 10 25 5
# 11: 2 1 1 1
# 12: 2 2 3 1
# 13: 2 3 10 2
# 14: 2 4 11 2
# 15: 2 5 12 2
I used a difference of <=2
to collect sessions.
Upvotes: 4