Sofie
Sofie

Reputation: 23

Mark timestamps into sessions in R

I have a series of timestamp representing a user's activity on a website. I want to distinguish these timestamps into sessions per user (defined as timestamps that are no more than 1800 seconds apart from each user). If possible I would like to add a column to my data set called session_nr. (eg. if the timestamps is more than 1800 sec apart or it is a new user the session number should increase.)

A sample dataset looks like this:

user_id             date    
58683      2015-08-01 07:18:13 
58683      2015-08-01 07:18:19 
58683      2015-08-01 07:18:33 
58683      2015-08-01 07:18:43 
58683      2015-08-01 07:18:51 
58683      2015-08-01 07:18:58 

The data is ordered with respect to each user and with respect to time.

Is there a way to loop through the users and the series of timestamps in R so that I can add a session number to each row in my data set??

I have started with the following code, but it does not work nor do I know how to add the session number.

user_session <- function(user, time_limit, data){
  u1 <- data[which(data$user_id == user),]
  Sys.setlocale("LC_TIME", "en_US.UTF-8")
  u1$date <- as.POSIXct(u1$date)

  u1$s.start <- c(TRUE, timediff(u1$date) > time_limit )
  u1$s.stop  <- c(u1$s.start[2:length(u1$s.start)], TRUE)

  u1$sessions <- data.frame(
  s.1 = which(u1$s.start),  # starts
  s.2 = which(u1$s.stop))   # stops

  return(u1)
}

use <- as.data.frame(unique(data$user_id))
  time_limit <- 1800
  for (i in dim(use)[1]){
    user <-  use[i,1]
    res <- user_session(user, time_limit, data)
}

Upvotes: 2

Views: 421

Answers (1)

Edwin
Edwin

Reputation: 3242

Here is a dplyr solution:

library(dplyr)
df %>% group_by(id) %>%
  mutate(time_since_last = as.numeric(date - lag(date))) %>% 
  mutate(new_session = is.na(time_since_last) | time_since_last > 1800) %>% 
  mutate(session_nr = cumsum(new_session))

Upvotes: 3

Related Questions