Reputation: 129
I am currently working on an application where I have a dataframe that looks like this:
Database UserId Hour Date 01 18 01.01.2016 01 18 01.01.2016 01 14 02.01.2016 01 14 02.01.2016 02 21 02.01.2016 02 08 05.01.2016 02 08 05.01.2016 03 23 05.01.2016
Each line represents a session.
I need to determine whether the time of the first session of a user has an impact on the number of sessions this user is going to have.
I have tried the command summaryBy
:
library(doBy)
first_hour <- summaryBy(UserId + Hour + Date ~ UserId,
FUN=c(head, length, unique), database)
But it doesn't give me the correct result.
My goal here is to determine the Hour
of the first session a user takes, determine how many sessions and how many different session dates a user has.
Upvotes: 3
Views: 2212
Reputation: 2250
Using base
commands, you can write your own function to select desired information:
user.info <- function(user){
temp <- subset(Database, Database$UserId == user)
return(c(UserId=user, FirstHour=temp$Hour[1], Sessions=nrow(temp), Dates=length(unique(temp$Date))))
}
t(sapply(unique(Database$UserId), FUN=user.info))
# UserId FirstHour Sessions Dates
# [1,] 1 18 4 2
# [2,] 2 21 3 2
# [3,] 3 23 1 1
Here, FirstHour
is the hour on the first listed row for the given user, Sessions
is the number of rows for the user and Dates
is the number of different dates listed for the user.
The function is applied to all unique users and the final table is transposed.
Upvotes: 0
Reputation: 887088
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'UserId', we order
the 'Date', get the first
'Hour', total number of sessions (.N
) and the number of unique
Date elements (uniqueN(Date)
).
library(data.table)
setDT(df1)[order(UserId, as.Date(Date, "%m.%d.%Y")),.(Hour = Hour[1L],
Sessions = .N, DifferSessionDate = uniqueN(Date)) , by = UserId]
# UserId Hour Sessions DifferSessionDate
#1: 1 18 4 2
#2: 2 21 3 2
#3: 3 23 1 1
Upvotes: 2
Reputation: 926
You could also do this using dplyr
:
library(dplyr)
dt %>% group_by(UserId) %>% summarise(FirstHour = min(Hour),
NumSessions = n(),
NumDates = length(unique(Date)))
Source: local data frame [3 x 4]
UserId FirstHour NumSessions NumDates
(int) (int) (int) (int)
1 1 14 4 2
2 2 8 3 2
3 3 23 1 1
Upvotes: 0