Udacity: Assignment 3: ValueError: bad input shape (1000, 10)

Question

I am working on Assignment 3: Regularization. After taking a look into that Github, I tried to solve the assignment one my own, but I am getting a runtime error. Note that I have picked a smaller size of the dataset, than the link.

This is the situation:

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
#Training set (20000, 784) (20000, 10)
#Validation set (1000, 784) (1000, 10)
#Test set (1000, 784) (1000, 10)

and here is the problem:

from sklearn.linear_model import LogisticRegression

original_train_labels = train_labels

logit_clf = LogisticRegression(penalty='l2')
logit_clf.fit(train_dataset[:1000,:], original_train_labels[:1000])

which when run, gives:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in ()
      4 
      5 logit_clf = LogisticRegression(penalty='l2')
----> 6 logit_clf.fit(train_dataset[:1000,:], original_train_labels[:1000])
      7 predicted = logit_clf.predict(test_dataset)
      8 print('accuracy', accuracy((np.arange(num_labels) == predicted[:,None]).astype(np.float32), test_labels), '%')

/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.pyc in fit(self, X, y, sample_weight)
   1140 
   1141         X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64, 
-> 1142                          order="C")
   1143         check_classification_targets(y)
   1144         self.classes_ = np.unique(y)

/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    513                         dtype=None)
    514     else:
--> 515         y = column_or_1d(y, warn=True)
    516         _assert_all_finite(y)
    517     if y_numeric and y.dtype.kind == 'O':

/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.pyc in column_or_1d(y, warn)
    549         return np.ravel(y)
    550 
--> 551     raise ValueError("bad input shape {0}".format(shape))
    552 
    553 

ValueError: bad input shape (1000, 10)

Any idea on how to fix this?

Maksim Khaitovich · Accepted Answer

You use one-hot encoding for train_labels. Meaning it has shape like [1000. 10], 1000 of samples and each has 10 'columns' with 1 indicating which class we are talking about. It is required for neural networks, but Logistics Regression from sklearn requires it to be of shape [1000, 1], meaning that it should be just a vector of 1000 rows and in each row you should have an int which indicates target class. Convert one-hot encoding to integers with argmax function and you should be all set.

Udacity: Assignment 3: ValueError: bad input shape (1000, 10)

Answers (2)

Related Questions