user4977763
user4977763

Reputation:

heatmap for logistic regression

So,

I am new to python and have a little problem with my code

X_train, Y_train, Xtest, ytest = pickle.load(open("data.p", "rb"))

h = 100
x_min, x_max = X_train.min() - 1, X_train.max() + 1
y_min, y_max = X_train.min() - 1, X_train.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                     np.arange(y_min, y_max, h))

logreg = linear_model.LogisticRegression(C=1.0, penalty='l2', tol=1e-6).fit(X_train, Y_train)

grid_data = np.c_[xx.ravel(), yy.ravel()]
Z = logreg.predict_proba(grid_data)[:,1]
Z = Z.reshape(xx.shape)

yhat = logreg.predict_proba(Xtest)[:,1]
r = scipy.stats.pearsonr(yhat, ytest)[0]

plt.imshow(Z, extent=[xx.min(), xx.max(), yy.max(), yy.min()])

plt.plot(Xtest[ytest==0, 0], Xtest[ytest==0, 1], 'co')
plt.plot(Xtest[ytest==1, 0], Xtest[ytest==1, 1], 'ro')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title('r=' + str(r))

plt.show()

when I run this code with a data file of size x:2 it works flawlessly

but I also have data with more columns than 2.

12 to be exact and for that python sends me this error

Z = logreg.predict_proba(grid_data)[:,1]
  File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\logistic.py", line 128, in predict_proba
    return self._predict_proba_lr(X)
  File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\base.py", line 229, in _predict_proba_lr
    prob = self.decision_function(X)
  File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\base.py", line 196, in decision_function
    % (X.shape[1], n_features))
ValueError: X has 2 features per sample; expecting 12

Somehow I need to set grid_data to have 12 columns?! But I don't know how

edit:

added rest of the code

Upvotes: 0

Views: 1027

Answers (1)

eqzx
eqzx

Reputation: 5559

Your model is fit to 12 dimensional data (X_train.shape is (N, 12)), and you're trying to run prediction on 2 dimensional data (look at the shape of grid_data). It doesn't make sense to predict values on 2D features when the model was fit with 12D features.

I'm guessing your data exists as features within that grid, so you could do something like nearest neighbours to retrieve the closest input point from X_train for all grid points (if your data exactly lies on the grid, then the lookup should reduce to indexing it correctly), and then associate the output predictions with the grid points.

Upvotes: 1

Related Questions