Reputation: 806
I would like to plot y_test and prediction in a scatter plot. I am using the logistic regression as model.
from sklearn.linear_model import LogisticRegression
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Spam'])
y = df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)
lr = LogisticRegression(solver='liblinear').fit(X_train, y_train)
pred_log = lr.predict(X_test)
I have tried as follows
## Plot the model
plt.scatter(y_test, pred_log)
plt.xlabel("True Values")
plt.ylabel("Predictions")
and I got this:
that I do not think it is what I should expect.
y_test
is (250,), similarly pred_log
is (250,)
Am I considering the wrong variables to plot, or they are right? I have no idea one what the plot with those four values mean. I would have been expected more dots in the plot, but maybe I am wrong.
Please let me know if you need more info. Thanks
Upvotes: 0
Views: 2248
Reputation: 4130
I think you know LogisticRegression is a classification algorithm. If you do binary classification it will predict whether predicted class is 0 or 1.If you want to get visualization about how model preform, you should consider confusion matrix.You can't use scatterplot for visualize classification results.
import seaborn as sns
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cf_matrix, annot=True)
confusion matrix shows how many labels have correct predictions and how many are wrong.Looking at confusion matrix you can calculate how accurate the model.We can use different metrices like precision,recall and F1 score.
Upvotes: 1