user6710328
user6710328

Reputation: 3

"long" dataframe to "transactions" for arules in R

I am working with the arules Package in R. I have a "long" dataframe A of transactions with two columns like this one:

TID itemNO 2393117 SJUMRE14E 2393118 ATVBCT14 2393127 L07EGG13E 2393128 MCM3100W 2393130 S1017501771 2393131 S6LN13X 2393133 SPCCLI551C 2393133 SPCPGI550BK 2393133 SPCCLI551Y 2393133 SPCCLI551C

As we can see the last 4 items belong to one transaction and I need to convert it to a "transaction" class to work with in apriori function.
From what I have been able to find so far is thins way of converting this into "transactions" would be done in this manner

TransAction <- as(split(A[,"itemNo"],A[,"TID"]), "transactions")

However, since I have over 2.5 million transactions this is extreamly timeconsuming and this takes up to 1 hour. It is due to the split() function, is there any way of speeding up the process with plyr or data.table packages that could replace the split() function?

Upvotes: 0

Views: 1022

Answers (1)

Gopala
Gopala

Reputation: 10473

Here is one way I do it, and find it to be faster. Idea is to create a wide data frame of 0/1 values, and then feed that to create transactions. Does not require any split.

library(dplyr)
library(tidyr)
library(arules)

df <- df %>%
  select(TID, itemNO) %>%
  distinct() %>%
  mutate(value = 1) %>%
  spread(itemNO, value, fill = 0)

itemMatrix <- as(as.matrix(df[, -1]), 'transactions')

Upvotes: 4

Related Questions