Reputation: 2475
I am relatively new to logistic regression using SciKit learn in Python. After reading some topics and viewing some demo's, I decided to dive in myself.
So, basically, I am trying to predict the conversion rate of customers, based on some features. The outcome is either Active (1) or Not active (0). I tried KNN and logistic regression. With KNN I get an average accuracy of 0.893
and with logistic regression 0.994
. The latter seems so high, is that even realistic / possible?
Anyway: Suppose that my model is indeed very accurate, I would now like to import a new dataset with the same feauture columns and predict their conversions (they end this month). In the case above I used cross_val_score
to get the accuracy scores.
Do I now need to import the new set, somehow fit that new set to this model. (not training it again, now I just want to use it)
Can someone please inform me how I can proceed? If additional info is needed, please comment on that.
Thanks in advance!
Upvotes: 2
Views: 1568
Reputation: 607
For the statistic question: of course, it can happen, either your data is having little noise or the scenario Clock Slave mentioned in the comments.
For the import of the classifier, you could pickle
it ( save it as a binary with the pickle
module, and then just load it whenever you need it and use the clf.predict()
method on the new data
import pickle
#Do the classification and name the fitted object clf
with open('clf.pickle', 'wb') as file :
pickle.dump(clf,file,pickle.HIGHEST_PROTOCOL)
And then later you can load it
import pickle
with open('clf.pickle', 'rb') as file :
clf =pickle.load(file)
# Now predict on the new dataframe df as
pred = clf.predict(df.values)
Upvotes: 3
Reputation: 131
Beside 'Pickle', 'joblib' can be used as well.
##
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
assume there X,Y, already defined
model = LogisticRegression()
model.fit(X, Y)
save the model to disk
filename = 'finalized_model.sav'
joblib.dump(model, filename)
load the model from disk
loaded_model = joblib.load(filename)
result = loaded_model.score(X_test, Y_test)
Upvotes: 2