Reputation: 23
I have a series of timestamp representing a user's activity on a website. I want to distinguish these timestamps into sessions per user (defined as timestamps that are no more than 1800 seconds apart from each user). If possible I would like to add a column to my data set called session_nr. (eg. if the timestamps is more than 1800 sec apart or it is a new user the session number should increase.)
A sample dataset looks like this:
user_id date
58683 2015-08-01 07:18:13
58683 2015-08-01 07:18:19
58683 2015-08-01 07:18:33
58683 2015-08-01 07:18:43
58683 2015-08-01 07:18:51
58683 2015-08-01 07:18:58
The data is ordered with respect to each user and with respect to time.
Is there a way to loop through the users and the series of timestamps in R so that I can add a session number to each row in my data set??
I have started with the following code, but it does not work nor do I know how to add the session number.
user_session <- function(user, time_limit, data){
u1 <- data[which(data$user_id == user),]
Sys.setlocale("LC_TIME", "en_US.UTF-8")
u1$date <- as.POSIXct(u1$date)
u1$s.start <- c(TRUE, timediff(u1$date) > time_limit )
u1$s.stop <- c(u1$s.start[2:length(u1$s.start)], TRUE)
u1$sessions <- data.frame(
s.1 = which(u1$s.start), # starts
s.2 = which(u1$s.stop)) # stops
return(u1)
}
use <- as.data.frame(unique(data$user_id))
time_limit <- 1800
for (i in dim(use)[1]){
user <- use[i,1]
res <- user_session(user, time_limit, data)
}
Upvotes: 2
Views: 421
Reputation: 3242
Here is a dplyr
solution:
library(dplyr)
df %>% group_by(id) %>%
mutate(time_since_last = as.numeric(date - lag(date))) %>%
mutate(new_session = is.na(time_since_last) | time_since_last > 1800) %>%
mutate(session_nr = cumsum(new_session))
Upvotes: 3