sfactor
sfactor

Reputation: 13062

In R how to get the statistics on time difference of sessions

I have data that represents different sessions of users. This is in a format like

User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2      1291032746      1291032748
...

Note that the timestamps are unix times.

I need to get the summary statistics like mean, percentiles on the number of sessions for each user. I also need to get the average time between each successive sessions for all the users. How do I go about doing this in R.

Upvotes: 1

Views: 251

Answers (2)

Sacha Epskamp
Sacha Epskamp

Reputation: 47551

You can probably get most things you want with ddply and summarise from plyr:

foo <- data.frame(
User = paste("user",c(1:3,1:3,1:3),sep=""), 
StartTime = as.numeric(Sys.time() + 1:9*10), 
EndTime = as.numeric(Sys.time() + 1:9*10 + 2))

library(plyr)

ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTimePerSes = mean(EndTime - StartTime),
AvgTimeBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
  User Nvisits AvgTimePerSes AvgTimeBetweenSes
1 user1       3       2            30
2 user2       3       2            30
3 user3       3       2            30

Edit:

Using the dataframe from Roman's answer:

foo <- read.table(textConnection("User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2      1291032746      1291032748"), header = TRUE)


library(plyr)

ddply(foo,"User",summarise,
    Nvisits = length(StartTime),
    AvgTime = mean(EndTime - StartTime),
    AvgBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)]) 
)
   User Nvisits AvgTime AvgBetweenSes
1 user1       2  107.50         56665
2 user2       4  415.75         20779
3 user3       2    2.50        122931

Upvotes: 2

Roman Luštrik
Roman Luštrik

Reputation: 70643

This should get you partially started.

sfac <- read.table(textConnection("User       StartTime        EndTime
user1      1291043867      1291044055
user2      1290970409      1290972041
user3      1291019561      1291019562
user2      1290897232      1290897244
user1      1291100532      1291100559
user3      1291142492      1291142496
user2      1291128374      1291128391
user2      1291032746      1291032748"), header = TRUE)

sfac$diff <- with(sfac, EndTime - StartTime) # add difference
sfac.split <- split(sfac, sfac$User)

#num of sessoins
lapply(sfac.split, nrow)

$user1
[1] 2

$user2
[1] 4

$user3
[1] 2

#mean
lapply(sfac.split, function(x) mean(x$diff))

$user1
[1] 107.5

$user2
[1] 415.75

$user3
[1] 2.5

Upvotes: 2

Related Questions