dhruba ghosh
dhruba ghosh

Reputation: 11

Apriori / Market Basket Analysis - limit analysis to itemsets of specific length

I'm trying to mine frequent itemsets in a dataset which has itemsets of products frequently sold together.


example itemsets:

A,B,C,D,E
A,B 
B,E
A,B     
B,C    
B,C,E    
A,C,F,G    
D,H,    
I,J,K,L    
A,J,K     
L,C,F
C,B

I use either the apriori of eclat function for getting the itemsets

rules <- apriori(tr, parameter = list(supp=0.01, conf=0.5,target="frequent itemsets")

Is there any way I can limit these itemsets to generate only based on transactions that have a specific length (order) say for only transactions with 2 items or 3 items and so on..?

So, for example when I want to mine the itemsets with length 2 for frequent itemsets, I should only see

     count
A,B  2 and not 3 because {A,B,C,D,E} doesn't qualify
B,E  1
B,C  2
D,H  1

Upvotes: 1

Views: 336

Answers (1)

Michael Hahsler
Michael Hahsler

Reputation: 3075

If I understand you right then you want to create transactions and subset them so you only retain the transactions that contain exactly 2 items. This is how you do it:

library('arules')

trans_list <- list(
  c('A', 'B', 'C', 'D', 'E'),
  c('A', 'B'), 
  c('B', 'E'),
  c('A', 'B'),     
  c('B', 'C'), 
  c('B', 'C', 'E'),    
  c('A', 'C', 'F', 'G'),    
  c('D', 'H'),
  c('I', 'J', 'K', 'L'),    
  c('A', 'J', 'K'),
  c('L', 'C', 'F'),
  c('C', 'B')
)

Create transactions from the list

trans <- as(trans_list, "transactions")
trans
#> transactions in sparse format with
#>  12 transactions (rows) and
#>  12 items (columns)

inspect(head(trans))
#>     items      
#> [1] {A,B,C,D,E}
#> [2] {A,B}      
#> [3] {B,E}      
#> [4] {A,B}      
#> [5] {B,C}      
#> [6] {B,C,E}

Select only transactions with a size of 2 items

trans_2 <- subset(trans, size(trans) == 2)
trans_2
#> transactions in sparse format with
#>  6 transactions (rows) and
#>  12 items (columns)

inspect(head(trans_2))
#>     items
#> [1] {A,B}
#> [2] {B,E}
#> [3] {A,B}
#> [4] {B,C}
#> [5] {D,H}
#> [6] {B,C}

Mine frequent itemsets

itemsets <- apriori(trans_2, parameter = list(supp=0.01, conf=0.5,target="frequent itemsets"))
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>          NA    0.1    1 none FALSE            TRUE       5    0.01      1
#>  maxlen            target   ext
#>      10 frequent itemsets FALSE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 0 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[6 item(s), 6 transaction(s)] done [0.00s].
#> sorting and recoding items ... [6 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 done [0.00s].
#> writing ... [10 set(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].

inspect(itemsets)
#>      items support   count
#> [1]  {E}   0.1666667 1    
#> [2]  {D}   0.1666667 1    
#> [3]  {H}   0.1666667 1    
#> [4]  {A}   0.3333333 2    
#> [5]  {C}   0.3333333 2    
#> [6]  {B}   0.8333333 5    
#> [7]  {B,E} 0.1666667 1    
#> [8]  {D,H} 0.1666667 1    
#> [9]  {A,B} 0.3333333 2    
#> [10] {B,C} 0.3333333 2

Upvotes: 1

Related Questions