How to get the equation of the boundary line in Linear Discriminant Analysis with sklearn

Question

I classified some data being split in 2 categories, with LinearDiscriminantAnalysis classifier from sklearn, and it works well, so I did this:

from sklearn.cross_validation import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25) # 25% of the dataset are not used for the training
clf = LDA()
clf.fit(x_train, y_train)

Then I manage to make prediction with it and that is fine here.

But, all that is in an ipython notebook, and I'd like to use the classifier elsewhere. I've seen the possibility of using pickles and joblib, but as I only have 2 groups and 2 features, so I though that I could just get the equation of the boundary line, and then check whether a given point is above or below the line to tell which group it belongs.

From what I understood, this line is orthogonal to the projection line and goes through the mean of the clusters' mean. I think I got the clusters' mean with np.mean(clf.means_, axis=0).

But here I'm stuck on how to use all the attributes like clf.coef_, clf.intercept_, etc... to find the equation of the projection line.

So, my question is how can I get the boundary line equation given my classifier.

It is also possible that I did not understood LDA properly, and I'd be delighted to have more explanations.

Thanks

lejlot · Accepted Answer

The decision boundary is simply line given with

np.dot(clf.coef_, x) - clf.intercept_ = 0

(up to the sign of intercept, which depending on the implementation may be flipped) as this is where the sign of the decision function flips.

How to get the equation of the boundary line in Linear Discriminant Analysis with sklearn

Answers (1)

Related Questions