Reputation: 51
I'm trying to create a Unet for semantic segmentation.. I've been following this repo that has the code from this article. I'm using the scene parsing 150 dataset instead of the one used in the article. My data is not one-hot encoded so I'm trying to use sparse_categorical_crossentropy for loss.
This is the shape of my data. x is RGB images, y is 1 channel annotations of categories (151 categories). Yes, I'm using just 10 samples of each, just for testing, this will be changed when I can actually get it to start training.
x_train shape: (10, 32, 32, 3)
y_train shape: (10, 32, 32, 1)
x_val shape: (10, 32, 32, 3)
y_val shape: (10, 32, 32, 1)
These are the first and last layer of the Unet.
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
__________________________________________________________________________________________________
... other layers ...
__________________________________________________________________________________________________
conv2d_23 (Conv2D) (None, 32, 32, 151) 453 conv2d_22[0][0]
==================================================================================================
Or from the code:
input_size = (IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS) # 32x32x3
inputs = Input(shape=input_size)
...
output = Conv2D(N_CLASSES, 1, activation='softmax')(conv_dec_4)
The exact error I am getting it:
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [10240,151] and labels shape [1]
[[{{node loss/dense_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]
I can see how the logits shape is [32x32xSamples,Number_of_classes]
but why is the label shape just [1]
?
I can not find a single search result of someone that has a label shape of [1]
so here I am, posting a new question. What am I doing wrong and what do I need to do to fix this? Please ask if you need any other relevant info.
I would very much prefer to keep using sparse_categorical_crossentropy and not one-hot encode and use categorical_crossentropy.
Also here are my package and python versions:
$ python -V
Python 3.6.7
$ pip list
Package Version
------------------------ -------------------
absl-py 0.12.0
astor 0.8.1
astunparse 1.6.3
attrs 21.2.0
cached-property 1.5.2
cachetools 4.2.2
certifi 2021.5.30
chardet 4.0.0
cycler 0.10.0
dataclasses 0.8
dill 0.3.3
flatbuffers 1.12
future 0.18.2
gast 0.4.0
google-auth 1.30.1
google-auth-oauthlib 0.4.4
google-pasta 0.2.0
googleapis-common-protos 1.53.0
grpcio 1.34.1
h5py 2.9.0
idna 2.10
importlib-metadata 4.4.0
Keras 2.4.3
Keras-Applications 1.0.8
keras-nightly 2.5.0.dev2021032900
Keras-Preprocessing 1.1.2
kiwisolver 1.3.1
Markdown 3.3.4
matplotlib 3.3.4
numpy 1.19.5
oauthlib 3.1.1
opencv-python 3.4.2.16
opt-einsum 3.3.0
Pillow 8.2.0
pip 21.1.2
promise 2.3
protobuf 3.17.2
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyparsing 2.4.7
python-dateutil 2.8.1
PyYAML 5.4.1
requests 2.25.1
requests-oauthlib 1.3.0
rsa 4.7.2
scipy 1.5.4
setuptools 57.0.0
six 1.15.0
tensorboard 1.14.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
tensorflow 1.14.0
tensorflow-datasets 1.3.2
tensorflow-estimator 1.14.0
tensorflow-metadata 1.0.0
termcolor 1.1.0
tqdm 4.61.0
typing-extensions 3.7.4.3
urllib3 1.26.5
Werkzeug 2.0.1
wheel 0.36.2
wrapt 1.12.1
zipp 3.4.1
"""Minimal Example"""
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import *
IMAGE_HEIGHT = 4
IMAGE_WIDTH = 4
IMAGE_CHANNELS = 3
MASK_CHANNELS = 1
N_CLASSES = 151
x_train = np.array([[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
])
x_train = np.expand_dims(x_train, axis=0)
y_train = np.array([[[0], [9], [1], [1]],
[[2], [1], [3], [6]],
[[1], [4], [1], [1]],
[[1], [1], [1], [8]],
])
y_train = np.expand_dims(y_train, axis=0)
x_val = np.array([[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
[[255, 255, 255], [255, 255, 255], [255, 255, 255], [255, 255, 255]],
])
x_val = np.expand_dims(x_val, axis=0)
y_val = np.array([[[0], [9], [1], [1]],
[[2], [1], [3], [6]],
[[1], [4], [1], [1]],
[[1], [1], [1], [8]],
])
y_val = np.expand_dims(y_val, axis=0)
inputs = Input(shape=(IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS))
conv_enc_1 = Conv2D(4, 3)(inputs)
output = Conv2D(N_CLASSES, 1, activation='softmax')(conv_enc_1)
unet = tf.keras.Model(inputs=inputs, outputs=output)
unet.summary()
unet.compile(optimizer='sgd', loss=tf.keras.losses.SparseCategoricalCrossentropy())
unet.fit((x_train, y_train), epochs=10, batch_size=1, shuffle=True, verbose=1, validation_data=(x_val, y_val))
Upvotes: 0
Views: 1446
Reputation: 51
As per Dominik Ficek's comment
unet.fit((x_train, y_train))
should have been:
unet.fit(x_train, y_train)
Upvotes: 1