SQL_M
SQL_M

Reputation: 2475

Making predictions with logistic regression (Python Sci Kit Learn)

I am relatively new to logistic regression using SciKit learn in Python. After reading some topics and viewing some demo's, I decided to dive in myself.

So, basically, I am trying to predict the conversion rate of customers, based on some features. The outcome is either Active (1) or Not active (0). I tried KNN and logistic regression. With KNN I get an average accuracy of 0.893 and with logistic regression 0.994. The latter seems so high, is that even realistic / possible?

Anyway: Suppose that my model is indeed very accurate, I would now like to import a new dataset with the same feauture columns and predict their conversions (they end this month). In the case above I used cross_val_score to get the accuracy scores.

Do I now need to import the new set, somehow fit that new set to this model. (not training it again, now I just want to use it)

Can someone please inform me how I can proceed? If additional info is needed, please comment on that.

Thanks in advance!

Upvotes: 2

Views: 1568

Answers (2)

Marvin Taschenberger
Marvin Taschenberger

Reputation: 607

For the statistic question: of course, it can happen, either your data is having little noise or the scenario Clock Slave mentioned in the comments.

For the import of the classifier, you could pickle it ( save it as a binary with the pickle module, and then just load it whenever you need it and use the clf.predict() method on the new data

import pickle 

#Do the classification and name the fitted object clf
with open('clf.pickle', 'wb') as file :
    pickle.dump(clf,file,pickle.HIGHEST_PROTOCOL)

And then later you can load it

import pickle 

with open('clf.pickle', 'rb') as file :
    clf =pickle.load(file)

# Now predict on the new dataframe df as 
pred = clf.predict(df.values)

Upvotes: 3

Nati
Nati

Reputation: 131

Beside 'Pickle', 'joblib' can be used as well.

## 
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

assume there X,Y, already defined

model = LogisticRegression()
model.fit(X, Y)

save the model to disk

filename = 'finalized_model.sav'
joblib.dump(model, filename)

load the model from disk

loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)

Upvotes: 2

Related Questions