Reputation: 23
I am struggling with association rule mining for a data set, the data set has a lot of binary attributes but also has a lot of categorical attributes. Converting the categorical to binary is theoretically possible but not practical. I am searching for a technique to overcome this issue.
Example of data for a car specifications, to execute association rule mining, the car color attribute should be a binary, and in the case of colors, we have a a lot of colors to be transferred to binary (My data set is insurance claims and its much worse than this example).
Upvotes: 2
Views: 2261
Reputation: 77485
Association rule mining doesn't use "attributes". It processes market basket type of data. It does not make sense to preprocess it to binary attributes. Because you would need to convert the binary attributes into items again (worst case, you would then tranlate your "color=blue" item into "color_red=0, color_black=0, ... color_blue=1" if you are also looking for negative rules.
Different algorithms - and different implementations of the theoretically same algorithm, unfortunately - will scale very differently.
APRIORI is designed to scale well with the number of transactions, but not very well with the number of different items that have minimum support; in particular if you are expecting short itemsets to be frequent only. Other algorithms such as Eclat and FP-Growth may be much better there. But YMMV.
First, try to convert the data set into a market basket format, in a way that you consider every item to be relevant. Discard everything else. Then start with a high minimum support, until you start getting results. Running with a too low minimum support may just run out of memory, or may take a long time.
Also, make sure to get a good implementation. A lot of things that claim to be APRIORI are only half of it, and are incredibly slow.
Upvotes: 2