Tensorflow: create y-indices from class labels

Question

I have class labels as:

y = ["class1", "class2", "class3"]

for using them in a model, I want to convert these classes to y_indices as 1, 2 with methods of keras and/or tensorflow2.0.

What I am doing currently is:

tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(y)
y_train = tokenizer.texts_to_sequences(y)

I know that the tokenizer is kind of misused here. Are there better and smaller solutions for converting class labels to indices? Thanks.

Nicolas Gervais · Accepted Answer

You can't use a Tokenizer for this because the Tokenizer indexing starts at 1, and not 0. You can use tf.where:

import tensorflow as tf

y = ['class3', 'class1', 'class1', 'class2', 'class3', 'class1', 'class2']

names = ["class1", "class2", "class3"]

labeler = lambda x: tf.where(tf.equal(x, names))

dataset = tf.data.Dataset.from_tensor_slices(y).map(labeler)

next(iter(dataset))

If you want to do it on a list or Numpy array you can use Scikit-Learn:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
    
le.fit_transform(y)

array([2, 0, 0, 1, 2, 0, 1], dtype=int64)

As I said previously, your implementation started indexing at 1:

[[2], [1], [1], [3], [2], [1], [3]]

This crashes Keras when it measures loss and metrics. It will return nan because you'll have three final neurons, but targets srtating from the 2nd index to the 4th. tl;dr don't use indexing that starts at 1 with Keras.

Tensorflow: create y-indices from class labels

Answers (1)

Related Questions