Reputation:
So,
I am new to python and have a little problem with my code
X_train, Y_train, Xtest, ytest = pickle.load(open("data.p", "rb"))
h = 100
x_min, x_max = X_train.min() - 1, X_train.max() + 1
y_min, y_max = X_train.min() - 1, X_train.max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
logreg = linear_model.LogisticRegression(C=1.0, penalty='l2', tol=1e-6).fit(X_train, Y_train)
grid_data = np.c_[xx.ravel(), yy.ravel()]
Z = logreg.predict_proba(grid_data)[:,1]
Z = Z.reshape(xx.shape)
yhat = logreg.predict_proba(Xtest)[:,1]
r = scipy.stats.pearsonr(yhat, ytest)[0]
plt.imshow(Z, extent=[xx.min(), xx.max(), yy.max(), yy.min()])
plt.plot(Xtest[ytest==0, 0], Xtest[ytest==0, 1], 'co')
plt.plot(Xtest[ytest==1, 0], Xtest[ytest==1, 1], 'ro')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title('r=' + str(r))
plt.show()
when I run this code with a data file of size x:2 it works flawlessly
but I also have data with more columns than 2.
12 to be exact and for that python sends me this error
Z = logreg.predict_proba(grid_data)[:,1]
File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\logistic.py", line 128, in predict_proba
return self._predict_proba_lr(X)
File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\base.py", line 229, in _predict_proba_lr
prob = self.decision_function(X)
File "D:\IDE\Anaconda\lib\site-packages\sklearn\linear_model\base.py", line 196, in decision_function
% (X.shape[1], n_features))
ValueError: X has 2 features per sample; expecting 12
Somehow I need to set grid_data to have 12 columns?! But I don't know how
edit:
added rest of the code
Upvotes: 0
Views: 1027
Reputation: 5559
Your model is fit to 12 dimensional data (X_train.shape is (N, 12)),
and you're trying to run prediction on 2 dimensional data (look at the shape of grid_data
). It doesn't make sense to predict values on 2D features when the model was fit with 12D features.
I'm guessing your data exists as features within that grid, so you could do something like nearest neighbours to retrieve the closest input point from X_train
for all grid points (if your data exactly lies on the grid, then the lookup should reduce to indexing it correctly), and then associate the output predictions with the grid points.
Upvotes: 1