Reputation: 107
I am new to tensor flow and stackoverflow, so apologies in advance for any silly errors. I've had good success in feeding the lower level interfaces. So I decided, to give the tf.contrib.learn
higher level apis a try because it looked so easy. I'm working in Google Cloud Datalab (Jupyter notebook) But I've hit a roadblock and am looking for help.
Main Question: How do I instantiate a DNNClassifier
so that I can feed it a feature that is itself a list of tf.float32
numbers ?
Here are the details. I am reading a TFRecords
based input file with the following code:
def read_and_decode(filename_queue):
# get a tensorflow reader and read in an example
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# parse a single example
features = tf.parse_single_example(serialized_example, features={
'label': tf.FixedLenFeature([], tf.int64),
'features': tf.FixedLenFeature([], tf.string)} )
# convert to tensors and return
bag_of_words = tf.decode_raw(features['features'], tf.float32)
bag_of_words.set_shape([LEN_OF_LEXICON])
label = tf.cast(features['label'], tf.int32)
return bag_of_words, label
My unit test of this looks this:
# unit test
filename = VALIDATION_FILE
my_filename_queue = tf.train.string_input_producer([filename],
num_epochs=1)
x, y = read_and_decode(my_filename_queue)
print ('x[0] -> ', x[0])
print ('x[1] -> ', x[1])
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
And gives the following output:
x[0] -> Tensor("strided_slice_6:0", shape=(), dtype=float32)
x[1] -> Tensor("strided_slice_7:0", shape=(), dtype=float32)
y -> Tensor("Cast_6:0", shape=(), dtype=int32) type -> <class
'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("DecodeRaw_3:0", shape=(2633,), dtype=float32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
The read_and_decode function is called by input_pipeline which has the following def and unit test:
def input_pipeline(filenames, batch_size, num_epochs=None):
filename_queue = tf.train.string_input_producer(filenames,
num_epochs=num_epochs, shuffle=True)
example, label = read_and_decode(filename_queue)
min_after_dequeue = 10000
capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch([example,
label], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue)
return example_batch, label_batch
# unit test
x, y = input_pipeline([VALIDATION_FILE], BATCH_SIZE, num_epochs=1)
print ('y -> ', y, 'type -> ', type(y))
print ('x -> ', x, 'type -> ', type(x))
And has the following output:
y -> Tensor("shuffle_batch_4:1", shape=(100,), dtype=int32) type ->
<class 'tensorflow.python.framework.ops.Tensor'>
x -> Tensor("shuffle_batch_4:0", shape=(100, 2633), dtype=float32)
type -> <class 'tensorflow.python.framework.ops.Tensor'>
The trainer that will take these feeds looks like this:
def run_training():
#feature_columns = ????????????
feature_columns = tf.contrib.layers.real_valued_column("",
dimension=LEN_OF_LEXICON, dtype=tf.float32)
estimator = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns,
n_classes=5,
hidden_units=[1024, 512, 256],
optimizer =
tf.train.ProximalAdagradOptimizer(learning_rate=0.1,
l1_regularization_strength=0.001) )
estimator.fit(input_fn=lambda: input_pipeline([VALIDATION_FILE],
BATCH_SIZE, num_epochs=1))
# unit test
run_training()
The instantiation of DNNClassifier
passes fine, but the call to estimator.fit()
throws an exception (traceback below snippet below). My input_pipeline
is supplying the feed as shown in the tensor flow documentation, but somehow the form of the data inside the tensor does not appear to be correct. Anyone have any thoughts on this?
---------------- Traceback Snippet -----------------
> `/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/dnn.pyc in _dnn_model_fn(features, labels, mode, params, config)
126 feature_columns=feature_columns,
127 weight_collections=[parent_scope],
--> 128 scope=scope)
129
130 hidden_layer_partitioner = (
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope)
247 scope,
248 output_rank=2,
--> 249 default_name='input_from_feature_columns')
250
251
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in _input_from_feature_columns(columns_to_tensors, feature_columns, weight_collections, trainable, scope, output_rank, default_name)
145 default_name):
146 """Implementation of `input_from(_sequence)_feature_columns`."""
--> 147 check_feature_columns(feature_columns)
148 with variable_scope.variable_scope(scope,
149 default_name=default_name,
/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/feature_column_ops.pyc in check_feature_columns(feature_columns)
806 seen_keys = set()
807 for f in feature_columns:
--> 808 key = f.key
809 if key in seen_keys:
810 raise ValueError('Duplicate feature column key found for column: {}. '
AttributeError: 'str' object has no attribute 'key'
`
Upvotes: 2
Views: 732
Reputation: 107
The solution is to use the function:
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input_fn(lambda: input_pipeline([INPUT_FILE], BATCH_SIZE, num_epochs=1))
It infers columns from the output signature of your input_fn. Easy peasy!
Upvotes: 2