Reputation: 11
I have a data base from excel file, include on 75 item, I want get equation that calculate the 75th element depending on the 74 elements value.
I am using Anaconda and python to mining data and get the rule by Apriori algorithm, I get the rule but it is not useful, because it is in An incomprehensible formula, and I have one question: * How convert the rule to equation that calculate the 75th element depending on the 74 elements value?
#Import libraries & dataset
import pandas as pd
from apyori import apriori
number_of_raws = 4000
number_of_columns = 75
store_data = pd.read_excel('C:\\Users\\smaol\\Aro\\1.xlsx',header=1)
#convert to list
transactions = []
for i in range(0, number_of_raws):
for j in range(0, number_of_columns):
transactions.append(str(store_data.values[i,j]))
#found Rule
association_rules = apriori(transactions, min_support=0.025, min_confidence=0.5, min_lift=3, min_length=2)
association_results = list(association_rules)
The result:
[RelationRecord(items=frozenset({'a', 'n'}), support=0.043116666666666664, ordered_statistics=[OrderedStatistic(items_base=frozenset({'a'}), items_add=frozenset({'n'}), confidence=1.0, lift=23.192887514495556), OrderedStatistic(items_base=frozenset({'n'}), items_add=frozenset({'a'}), confidence=1.0, lift=23.192887514495556)])]
[RelationRecord(items=frozenset({'a', 'n'}), support=0.043116666666666664, ordered_statistics=[OrderedStatistic(items_base=frozenset({'a'}), items_add=frozenset({'n'}), confidence=1.0, lift=23.192887514495556), OrderedStatistic(items_base=frozenset({'n'}), items_add=frozenset({'a'}), confidence=1.0, lift=23.192887514495556)])]
[RelationRecord(items=frozenset({'a', 'n'}), support=0.043116666666666664, ordered_statistics=[OrderedStatistic(items_base=frozenset({'a'}), items_add=frozenset({'n'}), confidence=1.0, lift=23.192887514495556), OrderedStatistic(items_base=frozenset({'n'}), items_add=frozenset({'a'}), confidence=1.0, lift=23.192887514495556)])]
[RelationRecord(items=frozenset({'a', 'n'}), support=0.043116666666666664, ordered_statistics=[OrderedStatistic(items_base=frozenset({'a'}), items_add=frozenset({'n'}), confidence=1.0, lift=23.192887514495556), OrderedStatistic(items_base=frozenset({'n'}), items_add=frozenset({'a'}), confidence=1.0, lift=23.192887514495556)])]
Upvotes: 1
Views: 205
Reputation: 77454
If you always have exactly the same number of rows, this most likely is not "market basket" data. There probably is value to the position information that is lost when you transform this to itemsets.
What about using a standard prediction algorithm. Such as decision trees and random forests?
Don't use "apyrori". It's an incorrect implementation.
Upvotes: 1