ValueError: labels shape must be [batch_size, labels_dimension], got (128, 2)

Question

Using TensorFlow version 1.3.0 in Python 3.5.2. I'm trying to mimic the functionality of the DNNClassifier in the Iris data tutorial on the TensorFlow website, and am running into difficulties. I'm importing a CSV file with about 155 rows of data and 15 columns, breaking the data into training and test data (where I try to classify either a positive or negative movement), and receive an error when I begin to train my classifier. Here's how the data is set up

    #import values from csv
    mexicof1 = pd.read_csv('Source/mexicoR.csv')

    #construct pandas dataframe
    mexico_df = pd.DataFrame(mexicof1)
    #start counting from mexico.mat.2.nrow.mexico.mat...1.
    mexico_dff = pd.DataFrame(mexico_df.iloc[:,1:16])
    mexico_dff.columns = ['tp1_delta','PC1','PC2','PC3','PC4','PC5','PC6','PC7', \
                  'PC8', 'PC9', 'PC10', 'PC11', 'PC12', 'PC13', 'PC14']


    #binary assignment for positive/negative values
    for i in range(0,155):
        if(mexico_dff.iloc[i,0] > 0):
            mexico_dff.iloc[i,0] = "pos"
        else:
            mexico_dff.iloc[i,0] = "neg"

    #up movement vs. down movement classification set up
    up = np.asarray([1,0])
    down = np.asarray([0,1])
    mexico_dff['tp1_delta'] = mexico_dff['tp1_delta'].map({"pos": up, "neg": down})


    #Break into training and test data
    #data: independent values
    #labels: classification
    mexico_train_DNN1data = mexico_dff.iloc[0:150, 1:15]
    mexico_train_DNN1labels = mexico_dff.iloc[0:150, 0]
    mexico_test_DNN1data = mexico_dff.iloc[150:156, 1:15]
    mexico_test_DNN1labels = mexico_dff.iloc[150:156, 0]

    #Construct numpy arrays for test data
    temptrain = []
    for i in range(0, len(mexico_train_DNN1labels)):
        temptrain.append(mexico_train_DNN1labels.iloc[i])
    temptrainFIN = np.array(temptrain, dtype = np.float32)

    temptest = []
    for i in range(0, len(mexico_test_DNN1labels)):
        temptest.append(mexico_test_DNN1labels.iloc[i])
    temptestFIN = np.array(temptest, dtype = np.float32)

    #set up NumPy arrays
    mTrainDat = np.array(mexico_train_DNN1data, dtype = np.float32)
    mTrainLab = temptrainFIN
    mTestDat = np.array(mexico_test_DNN1data, dtype = np.float32)
    mTestLab = temptestFIN

Doing this gives me data that looks like the following:

    #Independent value output
    mTestDat
    Out[289]: 
    array([[-0.08404002, -3.07483053,  0.41106853, ..., -0.08682428,
     0.32954004, -0.36451185],
   [-0.31538665, -2.23493481,  1.97653472, ...,  0.35220796,
     0.09061374, -0.59035355],
   [ 0.44257978, -3.04786181, -0.6633662 , ...,  1.34870672,
     0.43879321,  0.26306254],
   ..., 
   [ 2.38574553,  0.09045095, -0.09710167, ...,  1.20889878,
     0.00937434, -0.06398607],
   [ 1.68626559,  0.65349185,  0.23625408, ..., -1.16267788,
     0.45464727, -1.14916229],
   [ 1.58263958,  0.1223636 , -0.12084256, ...,  0.7947616 ,
    -0.47359121,  0.28013545]], dtype=float32)

    #Classification labels (up or down movement) output
    mTestLab
    Out[290]: 
    array([[ 0.,  1.],
   [ 0.,  1.],
   [ 0.,  1.],
   [ 1.,  0.],
   [ 0.,  1.],
   [ 1.,  0.],
    ........
   [ 1.,  0.],
   [ 0.,  1.],
   [ 0.,  1.],
   [ 0.,  1.]], dtype=float32)

After following the tutorial from this given set up, I can run the code as far as the classifier.train() function before it stops running and gives me the following error:

    # Specify that all features have real-value data
    feature_columns = [tf.feature_column.numeric_column("x", shape=[mexico_train_DNN1data.shape[1]])]

    # Build 3 layer DNN with 10, 20, 10 units respectively.
    classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                    hidden_units=[10, 20, 10],
                                    optimizer = tf.train.AdamOptimizer(0.01),
                                    n_classes=2) #representing either an up or down movement


    train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x = {"x": mTrainDat},
    y = mTrainLab,
    num_epochs = None,
    shuffle = True)

    #Now, we train the model
    classifier.train(input_fn=train_input_fn, steps = 2000)


      File "Source\Anaconda3\envs	ensorflow\lib\site-packages	ensorflow\python\estimator\canned\head.py", line 174, in _check_labels
(static_shape,))

    ValueError: labels shape must be [batch_size, labels_dimension], got (128, 2).

I'm not sure why I'm encountering this error, any help is appreciated.

DomJack · Accepted Answer

You're using one-hot ([1, 0] or [0, 1]) encoded labels when DNNClassifier expects a class label (i.e. 0 or 1). Decode a one-hot encoding on the last axis, use

class_labels = np.argmax(one_hot_vector, axis=-1)

Note for the binary it might be quicker to do

class_labels = one_hot_vector[..., 1].astype(np.int32)

though performance difference won't be massive and I'd probably use the more general version in case you add another class later.

In your case, after you've generated your numpy labels, just add

mTrainLab = np.argmax(mTrainLab, axis=-1)
mTestLab = np.argmax(mTestLab, axis=-1)

ValueError: labels shape must be [batch_size, labels_dimension], got (128, 2)

Answers (1)

Related Questions