Reputation: 289
I am trying to utilize tf.Transform lib for doing data preprocessing with TensorFlow via Apache Beam (Google DataFlow). https://github.com/tensorflow/transform
here is my setup:
conda create -n tftransform python=2.7
source activate tftransform
pip install tensorflow
pip install tensorflow-transform
pip install dill==0.2.6
git clone https://github.com/tensorflow/transform.git
cd transform/
python setup.py install # for good measure ...
I then try to execute simple_example (https://github.com/tensorflow/transform/blob/master/examples/simple_example.py):
python examples/simple_example.py
I get the following error:
AttributeError: 'DType' object has no attribute 'dtype'
(there is also a warning on import No handlers could be found for logger "oauth2client.contrib.multistore_file"
)
here is the stacktrace:
Traceback (most recent call last):
File "examples/simple_example.py", line 64, in <module>
preprocessing_fn, tempfile.mkdtemp()))
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__
result = p.apply(self, pvalueish, label)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply
pvalueish_result = self.runner.apply(transform, pvalueish)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply
return m(transform, input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform
return transform.expand(input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 597, in expand
self._output_dir)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 439, in __ror__
result = p.apply(self, pvalueish, label)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/pipeline.py", line 249, in apply
pvalueish_result = self.runner.apply(transform, pvalueish)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 162, in apply
return m(transform, input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 168, in apply_PTransform
return transform.expand(input)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/beam/impl.py", line 328, in expand
self._preprocessing_fn, input_schema)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 416, in run_preprocessing_fn
inputs = _make_input_columns(schema)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/impl_helper.py", line 218, in _make_input_columns
placeholders = schema.as_batched_placeholders()
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in as_batched_placeholders
for key, column_schema in self.column_schemas.items()}
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 87, in <dictcomp>
for key, column_schema in self.column_schemas.items()}
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 133, in as_batched_placeholder
return self.representation.as_batched_placeholder(self)
File "/Users/XXX/anaconda/envs/tftransform/lib/python2.7/site-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 330, in as_batched_placeholder
return tf.placeholder(column.domain.dtype,
AttributeError: 'DType' object has no attribute 'dtype'
Is this lib production ready ? How can I make this work ?
Upvotes: 0
Views: 1278
Reputation: 289
I ran the following:
python setup.py bdist_wheel
pip install ./dist/tensorflow_transform-0.1.6.dev0-py2-none-any.whl
this uninstalls tensorflow-transform-0.1.5
and installs tensorflow-transform-0.1.6.dev0
running python examples/simple_example.py
now works - I get the following result:
[{'s_integerized': 0,
'x_centered': -1.0,
'x_centered_times_y_normalized': -0.0,
'y_normalized': 0.0},
{'s_integerized': 1,
'x_centered': 0.0,
'x_centered_times_y_normalized': 0.0,
'y_normalized': 0.5},
{'s_integerized': 0,
'x_centered': 1.0,
'x_centered_times_y_normalized': 1.0,
'y_normalized': 1.0}]
thanks to @elmer-garduno
Upvotes: 1