Reputation: 13062
I have data that represents different sessions of users. This is in a format like
User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 1291032746 1291032748
...
Note that the timestamps are unix times.
I need to get the summary statistics like mean, percentiles on the number of sessions for each user. I also need to get the average time between each successive sessions for all the users. How do I go about doing this in R.
Upvotes: 1
Views: 251
Reputation: 47551
You can probably get most things you want with ddply
and summarise
from plyr
:
foo <- data.frame(
User = paste("user",c(1:3,1:3,1:3),sep=""),
StartTime = as.numeric(Sys.time() + 1:9*10),
EndTime = as.numeric(Sys.time() + 1:9*10 + 2))
library(plyr)
ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTimePerSes = mean(EndTime - StartTime),
AvgTimeBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
User Nvisits AvgTimePerSes AvgTimeBetweenSes
1 user1 3 2 30
2 user2 3 2 30
3 user3 3 2 30
Using the dataframe from Roman's answer:
foo <- read.table(textConnection("User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 1291032746 1291032748"), header = TRUE)
library(plyr)
ddply(foo,"User",summarise,
Nvisits = length(StartTime),
AvgTime = mean(EndTime - StartTime),
AvgBetweenSes = mean(StartTime[-1] - StartTime[-length(StartTime)])
)
User Nvisits AvgTime AvgBetweenSes
1 user1 2 107.50 56665
2 user2 4 415.75 20779
3 user3 2 2.50 122931
Upvotes: 2
Reputation: 70643
This should get you partially started.
sfac <- read.table(textConnection("User StartTime EndTime
user1 1291043867 1291044055
user2 1290970409 1290972041
user3 1291019561 1291019562
user2 1290897232 1290897244
user1 1291100532 1291100559
user3 1291142492 1291142496
user2 1291128374 1291128391
user2 1291032746 1291032748"), header = TRUE)
sfac$diff <- with(sfac, EndTime - StartTime) # add difference
sfac.split <- split(sfac, sfac$User)
#num of sessoins
lapply(sfac.split, nrow)
$user1
[1] 2
$user2
[1] 4
$user3
[1] 2
#mean
lapply(sfac.split, function(x) mean(x$diff))
$user1
[1] 107.5
$user2
[1] 415.75
$user3
[1] 2.5
Upvotes: 2