Reputation: 3
I am working with the arules Package in R
. I have a "long" dataframe A
of transactions with two columns like this one:
TID itemNO
2393117 SJUMRE14E
2393118 ATVBCT14
2393127 L07EGG13E
2393128 MCM3100W
2393130 S1017501771
2393131 S6LN13X
2393133 SPCCLI551C
2393133 SPCPGI550BK
2393133 SPCCLI551Y
2393133 SPCCLI551C
As we can see the last 4 items belong to one transaction and I need to convert it to a "transaction"
class to work with in apriori
function.
From what I have been able to find so far is thins way of converting this into "transactions"
would be done in this manner
TransAction <- as(split(A[,"itemNo"],A[,"TID"]), "transactions")
However, since I have over 2.5 million transactions this is extreamly timeconsuming and this takes up to 1 hour. It is due to the split()
function, is there any way of speeding up the process with plyr
or data.table
packages that could replace the split()
function?
Upvotes: 0
Views: 1022
Reputation: 10473
Here is one way I do it, and find it to be faster. Idea is to create a wide data frame of 0/1 values, and then feed that to create transactions. Does not require any split.
library(dplyr)
library(tidyr)
library(arules)
df <- df %>%
select(TID, itemNO) %>%
distinct() %>%
mutate(value = 1) %>%
spread(itemNO, value, fill = 0)
itemMatrix <- as(as.matrix(df[, -1]), 'transactions')
Upvotes: 4