cat1234
cat1234

Reputation: 49

Why is my Apriori function returning letters instead of items?? (wrong output)

The code:

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from apyori import apriori

dataset = [['egg','bread'],['milk'],['apple','milk'],['diapers'],['orange','egg','milk']]
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
final_df = pd.DataFrame(te_ary, columns=te.columns_)
print(final_df)

frq_itemsets= apriori(final_df, min_support=0.5, use_colnames=True)  
association_results = list(frq_itemsets)
print(association_results)

The output:

apple  bread  china    egg  embroidery   milk
0  False   True  False   True       False  False
1  False  False  False  False       False   True
2   True  False  False  False       False   True
3  False  False  False  False        True  False
4  False  False   True   True       False   True
[RelationRecord(items=frozenset({'a'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'a'}), confidence=0.5, lift=1.0)]), RelationRecord(items=frozenset({'e'}), support=0.6666666666666666, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'e'}), confidence=0.6666666666666666, lift=1.0)]), RelationRecord(items=frozenset({'i'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'i'}), confidence=0.5, lift=1.0)])]

What am I doing wrong?? I've searched everywhere on SO but I cant seem to find a question like this.

Thanks in advance. I hope it's not a stupid question. Can anyone help?

Upvotes: 1

Views: 562

Answers (2)

Michelle R
Michelle R

Reputation: 11

I ran into this same problem! For me, the solution was one-hot encoding the DF. In easiest terms, depending on your data set, this means converting it into a list.

df = df.astype(str)
str_df = df.values.tolist()
te_ary = te.fit(str_list).transform(str_list)

That fixed it for me!

Upvotes: 1

Ben.T
Ben.T

Reputation: 29635

I believe there is a misuse of apriori depending on from which package you get it. See below the difference

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

dataset = [['egg','bread'],['milk'],['apple','milk'],
           ['diapers'],['orange','egg','milk']]
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
final_df = pd.DataFrame(te_ary, columns=te.columns_)
print(final_df)

from mlxtend.frequent_patterns import apriori
# this method returns a dataframe, no need to use a list
df_freq = apriori(final_df, min_support=0.5, use_colnames=True)  
print(df_freq) 
#    support itemsets
# 0      0.6   (milk)

from apyori import apriori
# this method returns a generator hence the use of list to get the result
print(list(apriori(dataset, min_support=0.5, )))
# [RelationRecord(items=frozenset({'milk'}), support=0.6, 
#                 ordered_statistics=[OrderedStatistic(items_base=frozenset(), 
#                                     items_add=frozenset({'milk'}), 
#                                     confidence=0.6, lift=1.0)])]

Upvotes: 1

Related Questions