Making predictions with logistic regression (Python Sci Kit Learn)

Question

I am relatively new to logistic regression using SciKit learn in Python. After reading some topics and viewing some demo's, I decided to dive in myself.

So, basically, I am trying to predict the conversion rate of customers, based on some features. The outcome is either Active (1) or Not active (0). I tried KNN and logistic regression. With KNN I get an average accuracy of 0.893 and with logistic regression 0.994. The latter seems so high, is that even realistic / possible?

Anyway: Suppose that my model is indeed very accurate, I would now like to import a new dataset with the same feauture columns and predict their conversions (they end this month). In the case above I used cross_val_score to get the accuracy scores.

Do I now need to import the new set, somehow fit that new set to this model. (not training it again, now I just want to use it)

Can someone please inform me how I can proceed? If additional info is needed, please comment on that.

Thanks in advance!

Marvin Taschenberger · Accepted Answer

For the statistic question: of course, it can happen, either your data is having little noise or the scenario Clock Slave mentioned in the comments.

For the import of the classifier, you could pickle it ( save it as a binary with the pickle module, and then just load it whenever you need it and use the clf.predict() method on the new data

import pickle 

#Do the classification and name the fitted object clf
with open('clf.pickle', 'wb') as file :
    pickle.dump(clf,file,pickle.HIGHEST_PROTOCOL)

And then later you can load it

import pickle 

with open('clf.pickle', 'rb') as file :
    clf =pickle.load(file)

# Now predict on the new dataframe df as 
pred = clf.predict(df.values)

Making predictions with logistic regression (Python Sci Kit Learn)

Answers (2)

Related Questions