Reputation: 489
I have several historical orders information with two columns as a data frame: OrderID and Item. It contains about 1 Million records. I am trying to do association rules mining with this data frame and in order to utilize the arules package, i will have to convert the large data frame into transactions format. However, it takes very long time to convert and i tried with smaller data frames (300K rows) with the same structure, the conversion took several seconds to finish, but for the larger one, it takes for ever. Since i will work on even larger data set for association rules mining, is there any more efficient way to accomplish this?
I am using a fairly powerful machine and succeeded in smaller data frames. Below is the code i used to do the conversion.
library(tidyverse)
library(arules)
OrderID<-c("0001","0001","0002","0002")
Item<-c("ProductA","ProductB","ProductB","ProductC")
df<-data.frame(OrderID,Item)
df$OrderID<-as.factor(df$OrderID)
df$Item<-as.factor(df$Item)
df_trans<-as(split(df[,"Item"],df[,"OrderID"]),"transactions")
Upvotes: 1
Views: 1056
Reputation: 3075
This is a common issue. Here is a solution from the manual page for ?transactions
:
## example 4: creating transactions from a data.frame with
## transaction IDs and items (by converting it into a list of transactions first)
a_df3 <- data.frame(
TID = c(1,1,2,2,2,3),
item=c("a","b","a","b","c", "b")
)
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
trans4
inspect(trans4)
## Note: This is very slow for large datasets. It is much faster to
## read transactions using read.transactions() with format = "single".
## This can be done using an anonymous file.
write.table(a_df3, file = tmp <- file(), row.names = FALSE)
trans4 <- read.transactions(tmp, format = "single",
header = TRUE, cols = c("TID", "item"))
close(tmp)
inspect(trans4)
Upvotes: 1