Reputation: 11
I'm new to Tensorflow and I was following this tutorial using my csv data from local drive https://www.tensorflow.org/tutorials/structured_data/feature_columns, I could load the csv file and print column heads with
for feature_batch, label_batch in train_ds.take(1):
print('Every feature:', list(feature_batch.keys()))
print('A batch of traffic_type',label_batch)
When I was trying to create an embedding feature column with
_mt_datetime_embedding = feature_column.embedding_column(_mt_datetime, dimension=8)
demo(_mt_datetime_embedding)
This error showed up
AttributeError: 'EmbeddingColumn' object has no attribute 'num_buckets'. I don't know what is wrong? Could someone please help me? Many thanks.
Upvotes: 1
Views: 869
Reputation:
According to Tensorflow documentation about Embedding columns:
Suppose instead of having just a few possible strings, we have thousands (or more) values per category. For a number of reasons, as the number of categories grow large, it becomes infeasible to train a neural network using one-hot encodings. We can use an embedding column to overcome this limitation. Instead of representing the data as a one-hot vector of many dimensions, an embedding column represents that data as a lower-dimensional, dense vector in which each cell can contain any number, not just 0 or 1.
Using an embedding column
is best when a categorical column has many possible values.
Inputs for tf.feature_column.embedding_column
must be a CategoricalColumn
created by any of the categorical_column_* function
Syntax :
tf.feature_column.embedding_column(
categorical_column, dimension, combiner='mean', initializer=None,
ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True,
use_safe_embedding_lookup=True
)
When i have added input as a numeric_column
instead of categorical_column
then received AttributeError: 'NumericColumn' object has no attribute 'num_buckets'
age_embedding = feature_column.embedding_column(age, dimension=8)
demo(age_embedding)
Output:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-94a5fc74016e> in <module>()
1 age_embedding = feature_column.embedding_column(age, dimension=8)
----> 2 demo(age_embedding)
4 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/feature_column/feature_column_v2.py in create_state(self, state_manager)
3181 """Creates the embedding lookup variable."""
3182 default_num_buckets = (self.categorical_column.num_buckets
-> 3183 if self._is_v2_column
3184 else self.categorical_column._num_buckets) # pylint: disable=protected-access
3185 num_buckets = getattr(self.categorical_column, 'num_buckets',
AttributeError: 'NumericColumn' object has no attribute 'num_buckets'
When i have added input as a categorical_column
then it convert them to a dense representation. Here is the complete code.
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
dataframe = dataframe.copy()
labels = dataframe.pop('target')
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)
example_batch = next(iter(train_ds))[0]
def demo(feature_column):
feature_layer = layers.DenseFeatures(feature_column)
print(feature_layer(example_batch).numpy())
age = feature_column.numeric_column("age")
thal = feature_column.categorical_column_with_vocabulary_list(
'thal', ['fixed', 'normal', 'reversible'])
thal_embedding = feature_column.embedding_column(thal, dimension=8)
demo(thal_embedding)
Output:
193 train examples
49 validation examples
61 test examples
[[-0.4675103 0.61985296 0.06297898 0.00818724 0.05449321 -0.6865342
-0.05250816 -0.13339798]
[-0.4675103 0.61985296 0.06297898 0.00818724 0.05449321 -0.6865342
-0.05250816 -0.13339798]
[-0.4675103 0.61985296 0.06297898 0.00818724 0.05449321 -0.6865342
-0.05250816 -0.13339798]
[ 0.3212179 0.29932576 -0.44579896 -0.4998746 0.064592 0.16934885
0.02404759 0.5051637 ]
[-0.4675103 0.61985296 0.06297898 0.00818724 0.05449321 -0.6865342
-0.05250816 -0.13339798]]
For more details please refer here
Upvotes: 1