Reputation: 2232
Assume we have following DataFrame, where A, B, C, and D are the binary outcome of a classification task. "1" relates to "finished", "0" relates to "not finished".
A B C D True
0 1 1 1 1
1 0 0 0 0
1 1 1 1 1
1 1 1 1 1
0 1 1 1 1
0 0 0 0 0
1 1 1 1 1
0 1 0 0 1
0 1 1 1 1
1 1 1 1 1
0 1 0 0 0
I wonder how possible it is to predict the True
outcome, dependent on the values in A, B, C, D
.
Shall I apply a multivariate logistic regression with scikit learn
?
Upvotes: 0
Views: 129
Reputation: 40878
You could use sklearn's LogisticRegression
:
from sklearn.linear_model import LogisticRegression
endog = data.TRUE.values
exog = data.drop('TRUE', axis=1).values
model = LogisticRegression()
model.fit(exog, endog)
model.score(exog, endog) # mean accuracy
# 0.90909090909090906
model.predict(exog) # your predicted values
# array([1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1], dtype=int64)
Keep in mind in this example you are training a statistical model and then trying to predict based on the (in-sample) data you've already fed the model. That is generally regarded as shabby statistical practice, so proceed with caution or test on out-of-sample data.
Upvotes: 1