Reputation: 175

ValueError: logits and labels must have the same shape ((None, 10) vs (None, 1))

I am new to tensorflow I was trying to build a simple model that would output the probability of installation (install colum).

Here a subset of the dataset:

{'A': {0: 12, 2: 28, 3: 26, 4: 9, 5: 36},
 'B': {0: 10, 2: 17, 3: 22, 4: 2, 5: 31},
 'C': {0: 1, 2: 0, 3: 5, 4: 0, 5: 1},
 'D': {0: 5, 2: 0, 3: 0, 4: 0, 5: 0},
 'E': {0: 12, 2: 1, 3: 4, 4: 3, 5: 1},
 'F': {0: 12, 2: 2, 3: 14, 4: 9, 5: 11},
 'install': {0: 0, 2: 0, 3: 1, 4: 0, 5: 0},
 'G': {0: 21, 2: 12, 3: 8, 4: 13, 5: 19},
 'H': {0: 0, 2: 5, 3: 1, 4: 6, 5: 5},
 'I': {0: 21, 2: 22, 3: 5, 4: 10, 5: 20},
 'J': {0: 0.0, 2: 136.5, 3: 0.0, 4: 0.1, 5: 29.5},
 'K': {0: 0.15220949263502456,
  2: 0.08139534883720931,
  3: 0.15625,
  4: 0.15384584755440725,
  5: 0.04188829787234043},
 'L': {0: 649, 2: 379, 3: 531, 4: 660, 5: 242},
 'M': {0: 0, 2: 0, 3: 0, 4: 1, 5: 1},
 'N': {0: 1, 2: 1, 3: 1, 4: 0, 5: 0},
 'O': {0: 0, 2: 1, 3: 0, 4: 1, 5: 0},
 'P': {0: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'Q': {0: 1, 2: 0, 3: 1, 4: 0, 5: 1}}

And here the code I was working on:

X = df.drop('install', axis=1) #data
y = df['install'] #target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, test_size = 0.3)

X_train = ss.fit_transform(X_train)
X_test = ss.fit_transform(X_test)

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='softmax'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(10)
])

loss = keras.losses.BinaryCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(lr=0.001)
metrics = ["accuracy"]

model.compile(loss=loss, optimizer=optim, metrics=metrics)

batch_size = 32
epoch = 5
model.fit(X_train, y_train, batch_size=batch_size, epochs=epoch, shuffle=True, verbose=1)

Could you help me in understanding the error? I understood that the problem is about the size of my X and y.

Upvotes: 3

Answers (3)

Abhishek Prajapat

Reputation: 1888

Note: You have not specified which class the ss object belongs to and hence I will discuss everything removing it.

First let's discuss your target. i.e. the install column. From the values I assume that that your problem is Binary Classification i.e. predicting 0 and 1 and you want the probability of having them.

For this you have to define your model as below.

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(2, activation='softmax')
])

'''
Note: I have changed the activation of the first `dense` layer from
'softmax` to `relu` as `softmax` is not ideal for inner layers as it greatly
reduce information from each node. Although having 'softmax' will not result
in any syntax error but it is methodologically wrong.

Now the next major change is changing the number of units in the last
`Dense` layer from 10 to 2. What you want is the probability of having
either 0 or 1. So if you have the have the output from your model as `[a ,
b]` here a is some value corresponding to 0 and b corresponding to 1 then
you can get probability on them using the 'softmax' activation. Without
activation the values we get are called 'logits'.
'''

# Now you have to change your loss function as below
loss = tf.keras.losses.SparseCategoricalCrossentropy()

# The rest is same. Now we run a dummy trial of the model after training it using your code.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[9.9999726e-01, 2.7777487e-06],
       [9.5156413e-01, 4.8435837e-02]], dtype=float32)

This says the probability of sample 1 being 0 is '9.9999726e-01' i.e.
'0.999..' and of it being 1 is '2.7777487e-06' i.e. '0.00000277..` and these
gracefully sum up to 1. Same for the sample 2.
'''

There is another way of doing this. As you have only 1 label and hence if you have the probability corresponding to that label then you can have the probability corresponding to the other by subtracting it from 1. You can implement it as below:

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='relu'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(1, activation='sigmoid')
])

'''
The difference is 'softmax' and 'sigmoid' is that the 'softmax' is applied
on all the units in a unified manner but 'sigmoid' is applied on each
individual unit. So you can say that 'softmax' is the applied on the 'layer'
and 'sigmoid' is applied on the 'units'.

Now the output of the 'sigmoid' is the probability of the result being 1. So
we can say that the result could either be 0 or 1 depending on the output
probability with some threshold and hence we will not use a different loss
that is BinaryCrossEntropy as the values will be binary (either 0 or 1).
'''

loss = keras.losses.BinaryCrossentropy() # again without logits

# We once again the train the model using the rest of the code and analyze
the outputs.

preds = model.predict(X_test)
preds
'''
This gives the results:
array([[1.6424768e-13],
       [2.0349980e-06]], dtype=float32)

So for sample 1 we have the probability of it being '1' as '1.6424768e-13'
and as we have only '1' and '0' the probability of it being '0' is '1 -
1.6424768e-13'. Same for the sample 2.
'''

Now coming to answer from @Mattpats . This answer will also work but in this case you will not get probability as the output but instead you will get the logits as you are not using any activation and the loss is calculated on the logits by specifying the argument from_logits=True. To the probabilities from this you have to use it like below:

preds = model.predict(X_test)
sigmoid_preds = tf.math.sigmoid(preds).numpy()
preds, sigmoid_preds
'''
This give the following results:
preds = array([[-51.056973],
              [-32.444508]], dtype=float32)

sigmoid_preds = array([[6.702527e-23],
                      [8.119502e-15]], dtype=float32)
'''

Upvotes: 2

TC Arlen

Reputation: 1482

As written now, you create test labels y_train with shape (3,) and each train label is just 0 or 1. The network is setup to take training labels from 10 categories. That's what this line does in the model creation phase:

keras.layers.Dense(10)

To change to binary classification, it is recommended to change this final layer to

keras.layers.Dense(1, activation='sigmoid')

And you'll also need to modify the loss to this:

loss = keras.losses.BinaryCrossentropy()

If you'd like to create instead a multi-class classification with 10 classes, then you'll need to modify your y_train to be an array with 10 columns.

Upvotes: 1

Mattpats

Reputation: 544

I believe the last layer in your network is outputting 10 values, when it should be 1.

model = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation='softmax'),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(1) # needs to be 1
])

Upvotes: 0

ValueError: logits and labels must have the same shape ((None, 10) vs (None, 1))

Answers (3)

Related Questions