Reputation: 31
I have a data frame containing for each session (column "session") a sequence of actions (column "action"). Actions can be repeated within the same session (e.g. a->b->a for session 01), since what I am interested in is understanding the order in which they happen:
x<- data.frame(
session=c("01","01","01","02","02", "02","03","03"),
action=c("a","b","a","c","a","c", "a","b"))
I need to convert it into transactions format so that I can use 'arules' package to apply apriori algorithm for example. Desired output would be:
01 a,b,a
02 c,a,c
03 a,b
where basically for each session, the correspondent exact sequence is reported beside.
Which approach do you suggest?
Thank you.
Upvotes: 0
Views: 409
Reputation: 887901
With base R
, we can use aggregate
aggregate(action~ session, x, FUN = toString)
# session action
#1 01 a, b, a
#2 02 c, a, c
#3 03 a, b
If we need to convert to transactions
library(apriori)
as(split(x$action, x$session), "transactions")
Upvotes: 1
Reputation: 16121
x <- data.frame(session=c("01","01","01","02","02", "02","03","03"),
action=c("a","b","a","c","a","c", "a","b"))
library(dplyr)
x %>%
group_by(session) %>%
summarise(action = paste0(action, collapse = ","))
# # A tibble: 3 x 2
# session action
# <fct> <chr>
# 1 01 a,b,a
# 2 02 c,a,c
# 3 03 a,b
Upvotes: 0