Brian F
Brian F

Reputation: 107

Feeding tf.contrib.learn inputs into DNNClassifier

I am new to tensor flow and stackoverflow, so apologies in advance for any silly errors. I've had good success in feeding the lower level interfaces. So I decided, to give the tf.contrib.learn higher level apis a try because it looked so easy. I'm working in Google Cloud Datalab (Jupyter notebook) But I've hit a roadblock and am looking for help.

Main Question: How do I instantiate a DNNClassifier so that I can feed it a feature that is itself a list of tf.float32 numbers ?

Here are the details. I am reading a TFRecords based input file with the following code:

def read_and_decode(filename_queue):  
    # get a tensorflow reader and read in an example
    reader = tf.TFRecordReader()
    _, serialized_example = reader.read(filename_queue)

    # parse a single example
    features = tf.parse_single_example(serialized_example, features={ 
               'label': tf.FixedLenFeature([], tf.int64),
               'features': tf.FixedLenFeature([], tf.string)} )

    # convert to tensors and return
    bag_of_words = tf.decode_raw(features['features'], tf.float32)
    bag_of_words.set_shape([LEN_OF_LEXICON])
    label = tf.cast(features['label'], tf.int32) 

    return bag_of_words, label

My unit test of this looks this:

# unit test
filename = VALIDATION_FILE
my_filename_queue = tf.train.string_input_producer([filename], 
num_epochs=1)
x, y = read_and_decode(my_filename_queue)
print ('x[0] -> ', x[0])
print ('x[1] -> ', x[1])
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))

And gives the following output:

x[0] ->  Tensor("strided_slice_6:0", shape=(), dtype=float32)
x[1] ->  Tensor("strided_slice_7:0", shape=(), dtype=float32)
y ->  Tensor("Cast_6:0", shape=(), dtype=int32) type ->  <class 
'tensorflow.python.framework.ops.Tensor'>
x ->  Tensor("DecodeRaw_3:0", shape=(2633,), dtype=float32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>

The read_and_decode function is called by input_pipeline which has the following def and unit test:

def input_pipeline(filenames, batch_size, num_epochs=None):

    filename_queue = tf.train.string_input_producer(filenames, 
               num_epochs=num_epochs, shuffle=True)

    example, label = read_and_decode(filename_queue)

    min_after_dequeue = 10000
    capacity = min_after_dequeue + 3 * batch_size
    example_batch, label_batch = tf.train.shuffle_batch([example, 
           label], batch_size=batch_size, capacity=capacity, 
           min_after_dequeue=min_after_dequeue) 

    return example_batch, label_batch

# unit test
x, y = input_pipeline([VALIDATION_FILE], BATCH_SIZE, num_epochs=1)
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))

And has the following output:

y ->  Tensor("shuffle_batch_4:1", shape=(100,), dtype=int32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
x ->  Tensor("shuffle_batch_4:0", shape=(100, 2633), dtype=float32) 
type -> <class 'tensorflow.python.framework.ops.Tensor'>

The trainer that will take these feeds looks like this:

def run_training():
    #feature_columns = ????????????
    feature_columns = tf.contrib.layers.real_valued_column("", 
             dimension=LEN_OF_LEXICON, dtype=tf.float32)
    estimator = tf.contrib.learn.DNNClassifier(
                       feature_columns=feature_columns,
                       n_classes=5,
                       hidden_units=[1024, 512, 256], 
                       optimizer = 
                  tf.train.ProximalAdagradOptimizer(learning_rate=0.1, 
                             l1_regularization_strength=0.001) )

     estimator.fit(input_fn=lambda: input_pipeline([VALIDATION_FILE], 
            BATCH_SIZE, num_epochs=1))

# unit test
run_training()

The instantiation of DNNClassifier passes fine, but the call to estimator.fit() throws an exception (traceback below snippet below). My input_pipeline is supplying the feed as shown in the tensor flow documentation, but somehow the form of the data inside the tensor does not appear to be correct. Anyone have any thoughts on this?

---------------- Traceback Snippet -----------------
> `/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.pyc in _dnn_model_fn(features, labels, mode, params, config)
    126         feature_columns=feature_columns,
    127         weight_collections=[parent_scope],
--> 128         scope=scope)
    129 
    130   hidden_layer_partitioner = (
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope)
    247                                      scope,
    248                                      output_rank=2,
--> 249                                      default_name='input_from_feature_columns')
    250 
    251 
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in _input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope, output_rank, default_name)
    145                                 default_name):
    146   """Implementation of `input_from(_sequence)_feature_columns`."""
--> 147   check_feature_columns(feature_columns)
    148   with variable_scope.variable_scope(scope,
    149                                      default_name=default_name,
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in check_feature_columns(feature_columns)
    806   seen_keys = set()
    807   for f in feature_columns:
--> 808     key = f.key
    809     if key in seen_keys:
    810       raise ValueError('Duplicate feature column key found for column: {}. '
AttributeError: 'str' object has no attribute 'key'
`

Upvotes: 2

Views: 732

Answers (1)

Brian F
Brian F

Reputation: 107

The solution is to use the function:

feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input_fn(lambda: input_pipeline([INPUT_FILE], BATCH_SIZE, num_epochs=1))

It infers columns from the output signature of your input_fn. Easy peasy!

Upvotes: 2

Related Questions