elhoudev
elhoudev

Reputation: 91

multi hot encoding in tensorflow using tf.data.Dataset

I have a problem with the TF api tf.data.Dataset.from_tensor_slices()

The code below works well :

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)

target = [1,0,1,0,1]

indicator = tf.feature_column.indicator_column(letter_feature)

def make_input_fn (X,y):
    def input_fn():
        return (X,y)
    return input_fn

# THE INPUT FUNCTION WILL RETURN A SET : ( {'letter':[['A','A'],['C','D']...]}, [1,0,...] )

linear_estimator = tf.estimator.LinearClassifier(indicator)
input_fn = make_input_fn(features, target)

linear_estimator.train(input_fn)

This basically allow me to feed a column of shape (-1,2) with to my estimator model using the indicator feature_column.

Now I have an issue with the following usecase:

df_features = pd.DataFrame.from_dict(features)

######### this is the dataframe features####
#letter
#[A, A, A]
#[B, C, D]
#[B, E, F]
#[B, G, A]
#[B, X, R]

def make_input_fn (X,y):
    def input_fn():
        ds = tf.data.Dataset.from_tensor_slices((dict(X),y))
        ds = ds.shuffle(128)
        return ds
    return input_fn

linear_estimator = tf.estimator.LinearClassifier(indicator)
input_fn = make_input_fn(df_features,target)

linear_estimator.train(input_fn)

I end up getting this error :


TypeError: Could not build a TypeSpec for 0    [A, A, A]
1    [B, C, D]
2    [B, E, F]
3    [B, G, A]
4    [B, X, R]
Name: letter, dtype: object with type Series ...
TypeError: Expected binary or unicode string, got ['A', 'A', 'A']

This is really annoying because if I have large dataset I will need to use the tf.data.Dataset api to feed my estimator to be trained with small batches and eventually distribute the training process.

I will need a workaround to overcome this problem, I thought about generators but I'm not sure how to implement it yet I wanted to make sure if there is not any other solution

Thank you!

Upvotes: 1

Views: 1441

Answers (1)

user11530462
user11530462

Reputation:

Elaborating Richard_wth's comment for the benefit of the community.

The Error, TypeError: Expected binary or unicode string, got ['A', 'A', 'A'] can be resolved by making the changes mentioned below:

1. tf.data.Dataset.from_tensor_slices((dict(X), tf.one_hot(y, depth=2))) 
2. input_fn = make_input_fn(features,target) 
3. linear_estimator.train(input_fn, steps=2).

Complete working code is shown below:

import pandas as pd
import tensorflow as tf

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

df_features = pd.DataFrame.from_dict(features)

######### this is the dataframe features####
#letter
#[A, A, A]
#[B, C, D]
#[B, E, F]
#[B, G, A]
#[B, X, R]

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)


indicator = tf.feature_column.indicator_column(letter_feature)

target = [1,0,1,0,1]

def make_input_fn (X,y):
    def input_fn():
        ds = tf.data.Dataset.from_tensor_slices((dict(X), tf.one_hot(y, depth=2)))
        ds = ds.shuffle(128)
        return ds
    return input_fn

linear_estimator = tf.estimator.LinearClassifier(indicator)

input_fn = make_input_fn(features,target)

linear_estimator.train(input_fn, steps=2)

Happy Learning!

Upvotes: 1

Related Questions