How to scan the candidate itemset by using the item matrix

Question

I am doing a small data mining project and I encountered a problem that is, to scan the 'item matrix' and count the occurrence of each candidate itemset.

This is the what candidate itemsets look like. It is a list of several frozensets.
[{'🌭', '🍔', '🍕'},
 {'🍆', '🍉', '🍑'},
 {'🍆', '🍊', '🍑'},
 {'🌭', '🍔', '🍦'},
 {'🌭', '🌮', '🍕'}]

And below is the item matrix that I obtained. For every candidate in my candidate itemset, I need to check whether it is a subset of each row of the item matrix. In other words, I have to count the number of occurrence of each candidate itemset per row and sum it up.

I have tried to run for loops that is: for each row of the matrix, I check every candidate of whether any one is a subset of that row. If it is, then count +1. However, I am not able to make it with dictionary since set is unhashable. And now I am kind of frustrated about this problem.

To make the example reproducible, I changed the emoji to strings.

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

candidate_set = [{'Apple', 'Milk'}, {'Eggs', 'Milk'}, {'Onion', 'Yogurt'}]

To find how many times in total, for example 'Apple' and 'Milk', are true in in every single row.

Any help would be appreciated! Thanks

How to scan the candidate itemset by using the item matrix

Answers (1)

Related Questions