Rob
Rob

Reputation: 3459

Logistic Regression in scikitlearn

How do you handle graphs like this: enter image description here

using scikitlearn's LogisticRegression model. Is there a way to handle these sorts of problems easily using scikitlearn and a standard X, y input that maps to a graph like this?

Upvotes: 4

Views: 818

Answers (3)

piman314
piman314

Reputation: 5355

There have been a couple of answers already, but neither of them have mentioned any preprocessing of the data. So I will show both ways of looking at your problem.

First up I'll look at some manifold learning to transform you data into another space

# Do some imports that I'll be using
from sklearn import datasets, manifold, linear_model
from sklearn import model_selection, ensemble, metrics
from matplotlib import pyplot as plt

%matplotlib inline

# Make some data that looks like yours
X, y = datasets.make_circles(n_samples=200, factor=.5,
                             noise=.05)

First of all let's look at your current problem

plt.scatter(X[:, 0], X[:, 1], c=y)
clf = linear_model.LogisticRegression()
scores = model_selection.cross_val_score(clf, X, y)
print scores.mean()

Outputs:

Scatter plot of your data

0.440433749257

So you can see this data looks like yours and we get a terrible cross-validated accuracy with logistic regression. So if you're really attached the logistic regression, what we can do is project your data into a different space using some sort of manifold learning, for example:

Xd = manifold.LocallyLinearEmbedding().fit_transform(X)
plt.scatter(Xd[:, 0], Xd[:, 1], c=y)
clf = linear_model.LogisticRegression()
scores = model_selection.cross_val_score(clf, Xd, y)
print scores.mean()

Outputs:

enter image description here

1.0

So you can see that now your data is perfectally linearly seperable from the LocallyLinearEmbedding we get a much better classifier accuracy!

The other option that is available to you, that's been mentioned by other people is using a different model. While there are many options avaiable to you, I'm just going to show an example using RandomForestClassifier. I'm only going to train on half the data so we can evaluate the accuracy on an unbias set. I only used CV previously because it's quick and easy!

clf = ensemble.RandomForestClassifier().fit(X[:100], y[:100])
print metrics.accuracy_score(y[100:], clf.predict(X[100:]))

Outputs:

0.97

So we're getting a good accuracy! If you're interested to see what's going on, we can lift some code from one of the awesome scikit-learn tutorials.

plot_step = 0.02
x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1
y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))

Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, alpha=0.5)
plt.scatter(X[:, 0], X[:, 1], c=y)

Outputs:

Decision boundry of RF classifier

So this shows the areas of your space that are being classified into each class using the Random Forest model.

Two ways to solve the same problem. I leave working out which is best as an exercise to the reader...

Upvotes: 2

Mikhail Korobov
Mikhail Korobov

Reputation: 22238

As others said, Logistic Regression can't handle this kind of data well because it is a linear classifier. You may transform data to make it linearly separable, or choose another classifier which is better for such kind of data.

There is a nice visualisation of how various classifiers handle this problem in scikit-learn docs: see http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html. Second row is for your task:

enter image description here

Upvotes: 1

geompalik
geompalik

Reputation: 1582

A promising approach if you really want to use Logistic Regression for this particular setting would be to transform your coordinates from Cartesian system to Polar system. From the visualization, it seems that in that systems you data will be (almost) linearly separable.

This can be done as described here: Python conversion between coordinates

Upvotes: 2

Related Questions