Reputation: 11
I'm trying to mine frequent itemsets in a dataset which has itemsets of products frequently sold together.
example itemsets:
A,B,C,D,E
A,B
B,E
A,B
B,C
B,C,E
A,C,F,G
D,H,
I,J,K,L
A,J,K
L,C,F
C,B
I use either the apriori of eclat function for getting the itemsets
rules <- apriori(tr, parameter = list(supp=0.01, conf=0.5,target="frequent itemsets")
Is there any way I can limit these itemsets to generate only based on transactions that have a specific length (order) say for only transactions with 2 items or 3 items and so on..?
So, for example when I want to mine the itemsets with length 2 for frequent itemsets, I should only see
count
A,B 2 and not 3 because {A,B,C,D,E} doesn't qualify
B,E 1
B,C 2
D,H 1
Upvotes: 1
Views: 336
Reputation: 3075
If I understand you right then you want to create transactions and subset them so you only retain the transactions that contain exactly 2 items. This is how you do it:
library('arules')
trans_list <- list(
c('A', 'B', 'C', 'D', 'E'),
c('A', 'B'),
c('B', 'E'),
c('A', 'B'),
c('B', 'C'),
c('B', 'C', 'E'),
c('A', 'C', 'F', 'G'),
c('D', 'H'),
c('I', 'J', 'K', 'L'),
c('A', 'J', 'K'),
c('L', 'C', 'F'),
c('C', 'B')
)
Create transactions from the list
trans <- as(trans_list, "transactions")
trans
#> transactions in sparse format with
#> 12 transactions (rows) and
#> 12 items (columns)
inspect(head(trans))
#> items
#> [1] {A,B,C,D,E}
#> [2] {A,B}
#> [3] {B,E}
#> [4] {A,B}
#> [5] {B,C}
#> [6] {B,C,E}
Select only transactions with a size of 2 items
trans_2 <- subset(trans, size(trans) == 2)
trans_2
#> transactions in sparse format with
#> 6 transactions (rows) and
#> 12 items (columns)
inspect(head(trans_2))
#> items
#> [1] {A,B}
#> [2] {B,E}
#> [3] {A,B}
#> [4] {B,C}
#> [5] {D,H}
#> [6] {B,C}
Mine frequent itemsets
itemsets <- apriori(trans_2, parameter = list(supp=0.01, conf=0.5,target="frequent itemsets"))
#> Apriori
#>
#> Parameter specification:
#> confidence minval smax arem aval originalSupport maxtime support minlen
#> NA 0.1 1 none FALSE TRUE 5 0.01 1
#> maxlen target ext
#> 10 frequent itemsets FALSE
#>
#> Algorithmic control:
#> filter tree heap memopt load sort verbose
#> 0.1 TRUE TRUE FALSE TRUE 2 TRUE
#>
#> Absolute minimum support count: 0
#>
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[6 item(s), 6 transaction(s)] done [0.00s].
#> sorting and recoding items ... [6 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 done [0.00s].
#> writing ... [10 set(s)] done [0.00s].
#> creating S4 object ... done [0.00s].
inspect(itemsets)
#> items support count
#> [1] {E} 0.1666667 1
#> [2] {D} 0.1666667 1
#> [3] {H} 0.1666667 1
#> [4] {A} 0.3333333 2
#> [5] {C} 0.3333333 2
#> [6] {B} 0.8333333 5
#> [7] {B,E} 0.1666667 1
#> [8] {D,H} 0.1666667 1
#> [9] {A,B} 0.3333333 2
#> [10] {B,C} 0.3333333 2
Upvotes: 1