Reputation: 122
I would like to analyze customer data from my e-shop using association rules. These are steps that I took:
First: My dataframe raw_data has three columns ["id_customer","id_product","product_quantity"] and it contains 700,000 rows.
Second: I reorder my dataframe and I get a dataframe with 680,000 rows and 366 columns:
customer = (
raw_data.groupby(["id_customer", "product_id"])["product_quantity"]
.sum()
.unstack()
.reset_index()
.fillna(0)
.set_index("id_customer")
)
customer[customer != 0] = 1
Finally: I would like to create a frequency of items:
from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(customer, min_support=0.00001, use_colnames=True)
but now I got an error MemoryError: Unable to allocate 686. GiB for an array with shape (66795, 2, 689587) and data type float64
How to fix it? Or how to compute frequent_itemsets
without using apriori
function?
Upvotes: 1
Views: 2190
Reputation: 1858
If you have data that is too large to fit in memory, you may pass a function returning a generator
instead of a list.
from efficient_apriori import apriori as ap
def data_generator(df):
"""
Data generator, needs to return a generator to be called several times.
Use this approach if data is too large to fit in memory.
"""
def data_gen():
yield [tuple(row) for row in df.values.tolist()]
return data_gen
transactions = data_generator(df)
itemsets, rules = ap(transactions, min_support=0.9, min_confidence=0.6)
Upvotes: 1