Fluxy
Fluxy

Reputation: 2978

How to add One-Hot Encoding layer to Tensorflow model?

I want to add a One-Hot encoding layer to Tensorflow 2 model. This is what I have so far:

import pandas as pd
import tensorflow as tf

# import CSV file to pandas DataFrame called df
# set categorical (CAT_COLUMNS) and numerical (NUM_COLUMNS) features

feature_cols = []

# Create IndicatorColumn for categorical features
for feature in CAT_COLUMNS:
  vocab = df[feature].unique()
  feature_cols.append(tf.feature_column.indicator_column(
      tf.feature_column.categorical_column_with_vocabulary_list(feature, vocab)))

# Create NumericColumn for numerical features
for feature in NUM_COLUMNS:
  feature_cols.append(tf.feature_column.numeric_column(feature, dtype=tf.int32))

print(feature_cols)

How should I use feature_cols in Tensorflow model, so that One-Hot Encoding is applied to categorical features only?

model = tf.keras.Sequential([
                             tf.keras.layers.Dense(units=1, input_shape=[len(df.columns)]),
                             tf.keras.layers.Dense(units=128, activation=tf.nn.relu),
                             tf.keras.layers.Dense(units=1, activation=tf.nn.softmax)
                            ])

Upvotes: 0

Views: 4782

Answers (3)

JeeyCi
JeeyCi

Reputation: 597

@Fluxy - example multi-input (for categorical feature & numerical feature)

import numpy as np
import tensorflow as tf
import keras

x= np.array([[1,2],[3,4],[5,6],[7,8], [1,2],[3,4],[5,6],[7,8]], dtype='int32')
print(x)
y=np.array([0,0,1,2,0,0,1,2])   # 3 classes = num_tokens in layers.CategoryEncoding
print(y)

ds= tf.data.Dataset.from_tensor_slices((x,y))   # tuple
print(ds)
features, labels= tuple(zip(*ds))

#inp = ds.map(lambda x, y: (x, tf.one_hot(y, depth=3)))
#print(list(inp.as_numpy_iterator()))

numerical_input = tf.keras.layers.Input(shape=(2,), dtype=tf.float32)

categorical_input = tf.keras.layers.Input(shape=(1,), dtype=tf.int32)
encoded = tf.keras.layers.CategoryEncoding( num_tokens=3, output_mode="one_hot")(categorical_input)

concat = tf.keras.layers.concatenate([numerical_input, encoded])

model = tf.keras.models.Model(inputs=[numerical_input, categorical_input], outputs=[concat])

predicted = model.predict([x, y])
print(predicted)

print(model.summary())
tf.keras.utils.plot_model(model, show_shapes=True)

enter image description here p.s.

if have too high dimensionality - instead of concatenating layer use embedding layer - advice here

Upvotes: 0

DachuanZhao
DachuanZhao

Reputation: 1349

Use tf.keras.layers.experimental.preprocessing .

Read https://www.tensorflow.org/tutorials/structured_data/preprocessing_layers for an example .

Upvotes: 1

happymacaron
happymacaron

Reputation: 490

I think you may provide the categorical and the numerical features as separate input and use tf.keras.layers.Concatenate to combine them.

Upvotes: 0

Related Questions