Reputation: 14054
I'm looking at this data set: https://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data
I preprocessed the data:
ca.1<-read.csv("CreditApproval.csv",T,",")
# From http://stackoverflow.com/q/4787332/
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
ca.1$A2<-remove_outliers(ca$A2)
ca.1$A3<-remove_outliers(ca$A3)
ca.1$A8<-remove_outliers(ca$A8)
ca.1$A11<-remove_outliers(ca$A11)
ca.1$A14<-remove_outliers(ca$A14)
ca.1$A15<-remove_outliers(ca$A15)
ca.1$A2<-discretize(ca.1$A2,"frequency",categories = 6)
ca.1$A3<-discretize(ca.1$A3,"frequency",categories = 6)
ca.1$A8<-discretize(ca.1$A8,"frequency",categories = 6)
ca.1$A11<-discretize(ca.1$A11,"frequency",categories = 6)
ca.1$A14<-discretize(ca.1$A14,"frequency",categories = 6)
ca.1$A15<-discretize(ca.1$A15,"frequency",categories = 6)
ca.1<-na.omit(ca.1)
After fine tuning the support, confidence, min/maxlen I'm still getting 65 rules:
> rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4), appearance=list(rhs=c("class=-", "class=+"), default="lhs"))
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)
lhs rhs support confidence lift
[1] {A5=g,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[2] {A4=u,A9=t,A10=t} => {class=+} 0.1521739 0.8974359 2.770607
[3] {A1=a,A9=f} => {class=-} 0.1717391 0.9753086 1.442579
[4] {A1=a,A9=f,A13=g} => {class=-} 0.1608696 0.9736842 1.440176
...[65]
As you can see +
rules have a greater lift, but less support and confidence than the -
rules. I've been looking through the docs, and can't find any parameter to limit by lift. Is this possible? If not, what do you do in situations like this?
Upvotes: 2
Views: 2677
Reputation: 438
Another way is to use arules::quality()
. For example:
association.rules <- apriori(tr, parameter = list(support=0.005, confidence=0.25, minlen=3, maxlen=10))
subRules<-association.rules[quality(association.rules)$lift > 1]
This function can filter by support, confidence, coverage, lift, count
.
Upvotes: 2
Reputation: 355
You can't limit apriori rules by lift alone. You have to get a limit by support and confidence first which you did here:
rules<-apriori(ca.1, parameter= list(supp=0.15, conf=0.89, minlen=3, maxlen=4)
Then after that, do something like this
rulesLift <- sort(subset(rules, subset = lift < 2), by="lift")
inspect(rulesLift)
Upvotes: 2
Reputation: 66
In arules package a special function to subset this object type is defined. In order to filter out rules with lift value less than 2 you can try the following:
subset(rules, subset = lift > 2)
Upvotes: 5
Reputation: 422
I think apriori function does not take lift as one of the parameter. I get this error if I try to set lift
Error: Invalid parameter: lift
Instead I could sort the rules by lift and pick the rules based on the lift value as follows
sort (rules, by="lift", decreasing=TRUE)
This is not a straightforward solution but a decent workaround
Upvotes: 0
Reputation: 4243
What if you tried:
apriori(df, parameter = list(lift = 0.3, minlen =2))
You can set your minimum lift to anything in this case, just chose 0.3.
Upvotes: -1