Reputation: 453
I'm getting started with arulesSequences with an aim to perform Frequent Sequence Mining on some data I have. The data for a store A looks like below:
CUSTOMER_ID seq_num Size bought_items
1 17399 1 2 {100,100}
2 17399 2 1 {800}
3 17399 3 2 {900,900}
4 17399 4 1 {405}
5 17399 5 4 {200,505,200,505}
What this means is that this customer #17399 shopped with this store A on multiple occasions. During his/her first shopping trip, this person bought items with item codes 100 and 100 (2 items). During his/her second shopping trip, this customer bought just the item 800. And so on.
Now i want to use cSPADE on this customer, where order doesnt matter within a "basket" but does matter across shopping trips. So eventually my record for customer 17399 would be:
CUSTOMER_ID bought_items
17399 {(100,100),800,(900,900),405,(200,505,200,505)}
Where {} contains the full sequence and () represents each shopping trip.
I understand in general this is a possibility. However, I haven't seen any examples (a few hours of searching) or notes explicitly talking about arulesSequences supporting this.
Upvotes: 3
Views: 1206
Reputation: 453
After several hours of study, I'm adding the answer I've found, in case it is useful to others.
The answer is yes - the package does have support for repeated items across baskets. In fact the example at this website: https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Sequence_Mining/SPADE shows this case. While no two sequence numbers (belonging to the same transaction) are the same in this example, there are overlapping elements. Even if they were the same (I played with the example input .txt ) there is no error when you use read_basket and cSPADE, which is what I was trying to apply.
A lot of examples out there on the web are for apriori, for which this repetition of items within a basket is not allowed. This causes a lot of confusion. The example pasted above is a good one that shows the use of cSPADE instead. Hope this helps folks out there.
Upvotes: 3