Harry
Harry

Reputation: 11

How to define an embedding column in tensorflow 2.0?

I'm new to Tensorflow and I was following this tutorial using my csv data from local drive https://www.tensorflow.org/tutorials/structured_data/feature_columns, I could load the csv file and print column heads with

for feature_batch, label_batch in train_ds.take(1):
  print('Every feature:', list(feature_batch.keys()))
  print('A batch of traffic_type',label_batch)

When I was trying to create an embedding feature column with

_mt_datetime_embedding = feature_column.embedding_column(_mt_datetime, dimension=8)
demo(_mt_datetime_embedding)

This error showed up

AttributeError: 'EmbeddingColumn' object has no attribute 'num_buckets'. I don't know what is wrong? Could someone please help me? Many thanks.

Upvotes: 1

Views: 869

Answers (1)

user11530462
user11530462

Reputation:

According to Tensorflow documentation about Embedding columns:

Suppose instead of having just a few possible strings, we have thousands (or more) values per category. For a number of reasons, as the number of categories grow large, it becomes infeasible to train a neural network using one-hot encodings. We can use an embedding column to overcome this limitation. Instead of representing the data as a one-hot vector of many dimensions, an embedding column represents that data as a lower-dimensional, dense vector in which each cell can contain any number, not just 0 or 1.

Using an embedding column is best when a categorical column has many possible values.

Inputs for tf.feature_column.embedding_column must be a CategoricalColumn created by any of the categorical_column_* function

Syntax :

tf.feature_column.embedding_column(
    categorical_column, dimension, combiner='mean', initializer=None,
    ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True,
    use_safe_embedding_lookup=True
)

When i have added input as a numeric_column instead of categorical_column then received AttributeError: 'NumericColumn' object has no attribute 'num_buckets'

age_embedding = feature_column.embedding_column(age, dimension=8)
demo(age_embedding)

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-94a5fc74016e> in <module>()
      1 age_embedding = feature_column.embedding_column(age, dimension=8)
----> 2 demo(age_embedding)

4 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/feature_column/feature_column_v2.py in create_state(self, state_manager)
   3181     """Creates the embedding lookup variable."""
   3182     default_num_buckets = (self.categorical_column.num_buckets
-> 3183                            if self._is_v2_column
   3184                            else self.categorical_column._num_buckets)   # pylint: disable=protected-access
   3185     num_buckets = getattr(self.categorical_column, 'num_buckets',

AttributeError: 'NumericColumn' object has no attribute 'num_buckets'

When i have added input as a categorical_column then it convert them to a dense representation. Here is the complete code.

import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)

train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds

batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

example_batch = next(iter(train_ds))[0]

def demo(feature_column):
  feature_layer = layers.DenseFeatures(feature_column)
  print(feature_layer(example_batch).numpy())

age = feature_column.numeric_column("age")

thal = feature_column.categorical_column_with_vocabulary_list(
      'thal', ['fixed', 'normal', 'reversible'])

thal_embedding = feature_column.embedding_column(thal, dimension=8)
demo(thal_embedding)

Output:

193 train examples
49 validation examples
61 test examples

[[-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
  -0.05250816 -0.13339798]
 [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
  -0.05250816 -0.13339798]
 [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
  -0.05250816 -0.13339798]
 [ 0.3212179   0.29932576 -0.44579896 -0.4998746   0.064592    0.16934885
   0.02404759  0.5051637 ]
 [-0.4675103   0.61985296  0.06297898  0.00818724  0.05449321 -0.6865342
  -0.05250816 -0.13339798]]

For more details please refer here

Upvotes: 1

Related Questions