How to deal with large data in Apriori algorithm?

Question

I would like to analyze customer data from my e-shop using association rules. These are steps that I took:

First: My dataframe raw_data has three columns ["id_customer","id_product","product_quantity"] and it contains 700,000 rows.

Second: I reorder my dataframe and I get a dataframe with 680,000 rows and 366 columns:

customer = (
    raw_data.groupby(["id_customer", "product_id"])["product_quantity"]
    .sum()
    .unstack()
    .reset_index()
    .fillna(0)
    .set_index("id_customer")
)
customer[customer != 0] = 1

Finally: I would like to create a frequency of items:

from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(customer, min_support=0.00001, use_colnames=True)

but now I got an error MemoryError: Unable to allocate 686. GiB for an array with shape (66795, 2, 689587) and data type float64

How to fix it? Or how to compute frequent_itemsets without using apriori function?

How to deal with large data in Apriori algorithm?

Answers (1)

Related Questions