AdirSolo
AdirSolo

Reputation: 39

can i make linear regression predict like a classification?

I trained a linear regression model(using sklearn with python3), my train set was with 94 features and the class of them was 0 or 1.. than i went to check my linear regression model on the test set and it gave me those results:

1.[ 0.04988957] its real value is 0 on the test set

2.[ 0.00740425] its real value is 0 on the test set

3.[ 0.01907946] its real value is 0 on the test set

4.[ 0.07518938] its real value is 0 on the test set

5.[ 0.15202335] its real value is 0 on the test set

6.[ 0.04531345] its real value is 0 on the test set

7.[ 0.13394644] its real value is 0 on the test set

8.[ 0.16460608] its real value is 1 on the test set

9.[ 0.14846777] its real value is 0 on the test set

10.[ 0.04979875] its real value is 0 on the test set

as you can see that at row 8 it gave the highest value but the thing is that i want to use my_model.predict(testData) and it will give only 0 or 1 as results, how can i possibly do it? the model got any threshold or auto cutoff that i can use?

Upvotes: 1

Views: 3921

Answers (3)

eickenberg
eickenberg

Reputation: 14377

There is a linear classifier sklearn.linear_model.RidgeClassifer(alpha=0.) that you can use for this. Setting the Ridge penalty to 0. makes it do exactly the linear regression you want and set the threshold to divide between classes.

Upvotes: 1

America
America

Reputation: 408

Logistic regression (see sci-kit or statsmodels implementation) is the right tool here; it outperforms OLS in most cases and its predictions naturally lie in the interval (0, 1).

Upvotes: 1

sedavidw
sedavidw

Reputation: 11691

The LinearRegression class does not have a classifier on it. However there is an SGD Classifier (also a linear model) that can create the predictions you wants

Example code from documentation

>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
... 
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
        eta0=0.0, fit_intercept=True, l1_ratio=0.15,
        learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1,
        penalty='l2', power_t=0.5, random_state=None, shuffle=True,
        verbose=0, warm_start=False)
>>> print(clf.predict([[-0.8, -1]]))

Upvotes: 0

Related Questions