Reputation: 39
I trained a linear regression model(using sklearn with python3), my train set was with 94 features and the class of them was 0 or 1.. than i went to check my linear regression model on the test set and it gave me those results:
1.[ 0.04988957]
its real value is 0 on the test set
2.[ 0.00740425]
its real value is 0 on the test set
3.[ 0.01907946]
its real value is 0 on the test set
4.[ 0.07518938]
its real value is 0 on the test set
5.[ 0.15202335]
its real value is 0 on the test set
6.[ 0.04531345]
its real value is 0 on the test set
7.[ 0.13394644]
its real value is 0 on the test set
8.[ 0.16460608]
its real value is 1 on the test set
9.[ 0.14846777]
its real value is 0 on the test set
10.[ 0.04979875]
its real value is 0 on the test set
as you can see that at row 8 it gave the highest value but the thing is that i want to use my_model.predict(testData) and it will give only 0 or 1 as results, how can i possibly do it? the model got any threshold or auto cutoff that i can use?
Upvotes: 1
Views: 3921
Reputation: 14377
There is a linear classifier sklearn.linear_model.RidgeClassifer(alpha=0.)
that you can use for this. Setting the Ridge penalty to 0. makes it do exactly the linear regression you want and set the threshold to divide between classes.
Upvotes: 1
Reputation: 408
Logistic regression (see sci-kit or statsmodels implementation) is the right tool here; it outperforms OLS in most cases and its predictions naturally lie in the interval (0, 1).
Upvotes: 1
Reputation: 11691
The LinearRegression
class does not have a classifier on it. However there is an SGD Classifier (also a linear model) that can create the predictions you wants
Example code from documentation
>>> import numpy as np
>>> from sklearn import linear_model
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> Y = np.array([1, 1, 2, 2])
>>> clf = linear_model.SGDClassifier()
>>> clf.fit(X, Y)
...
SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
eta0=0.0, fit_intercept=True, l1_ratio=0.15,
learning_rate='optimal', loss='hinge', n_iter=5, n_jobs=1,
penalty='l2', power_t=0.5, random_state=None, shuffle=True,
verbose=0, warm_start=False)
>>> print(clf.predict([[-0.8, -1]]))
Upvotes: 0