Reputation: 41
I am working on a special case of a convolutional network for solving puzzles, which partly will be reused to classify images later. Currently I am working on setting up the final layer of the puzzle part.
Each puzzle consists of 9 pieces, and y_hat is a 81(9x9) long nparray containing the original position of each piece.
For the network I use a Tensorflow functional model, where the final layer is 9 small sub models softmaxing to indicate where the piece should go. I was wondering if there is any way to make the last layer of the submodels only output 1 for the highest value after softmax and 0 for the rest? I have been searching for days now. This still have to be a part of the neural network, so it can be used when training.
What I mean is:
[0.01,0.41,0.02,0.32,0.01,0.43] => [0,0,0,0,0,1]
Upvotes: 1
Views: 808
Reputation: 1658
If you just want to classify things, then use from_logit=True
in tf.keras.losses.CategoricalCrossentropy
[1] or tf.keras.losses.SparseCategoricalCrossentropy
[2], and output tf.keras.layers.Dense
layer. softmax calculation will be done in these loss functions.
By putting one_hot layer and softmax layer into one layer, computational efficiency gets better. You don't need to make your own one_hot layer for the output of your network. Just set
If you want to your own one_hot vector converter layer, I have an idea to keep a graph differential. Use very big parameter known as thermodynamic beta in softmax.
import tensorflow as tf
class OneHot(tf.keras.layers.Layer):
def __init__(self, infi=1e9):
super(OneHot, self).__init__()
self.infi = infi
def call(self, x):
return tf.nn.softmax(self.infi * x) # x has shape [B, 9]
def get_config(self):
return {'infi': self.infi}
See how it works below.
>>> a = tf.constant([[1.,2.], [3., 4.]], dtype=tf.float32)
>>> tf.nn.softmax(a)
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0.26894143, 0.7310586 ],
[0.26894143, 0.7310586 ]], dtype=float32)>
>>> tf.nn.softmax(1e9 * a)
<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[0., 1.],
[0., 1.]], dtype=float32)>
I am not sure how this output layer makes learning better.
== My first answer below. Not recommended.
You can use tf.one_hot
[3] to make it.
import tensorflow as tf
x = tf.constant([0.01,0.41,0.02,0.32,0.01,0.43], dtype=tf.float32)
i = tf.argmax(x)
y = tf.one_hot(i, 6)
# <tf.Tensor: shape=(6,), dtype=float32, numpy=array([0., 0., 0., 0., 0., 1.], dtype=float32)>
If you want to make a keras layer, make custom layer[4].
import tensorflow as tf
class OneHot(tf.keras.layers.Layer):
def __init__(self):
super(OneHot, self).__init__()
def call(self, x):
i = tf.argmax(x, axis=1) # x has shape [B, 9]
return tf.one_hot(i, 9)
[1] https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy
[2] https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
[3] https://www.tensorflow.org/api_docs/python/tf/one_hot
[4] https://www.tensorflow.org/guide/keras/custom_layers_and_models
Upvotes: 2
Reputation: 228
index = tf.argmax(one_hot_vector, axis=0)
Searching with "one hot decode" may help you.
Upvotes: -1