Reputation: 193
I recently upgraded the tensorflow version used in my program to the recently released 2.6.0, but I ran into a trouble.
import tensorflow as tf
pattern = 'hdfs://mypath'
print(tf.io.gfile.glob(pattern))
The above API throws an exception in version 2.6:
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme'hdfs' not implemented (file:xxxxx)
Then I checked the relevant implementation code and found that the official recommendation is to use tensorflow/io
to access hdfs, and the environment variable TF_USE_MODULAR_FILESYSTEM
is provided to use legacy access support. Since my code is more complex and difficult to refactor in a short time, I tried to use this environment variable, but it still failed.
In general, my questions are:
tf.io.gfile.glob
?Upvotes: 1
Views: 3756
Reputation: 193
TL.DR. Install tensorflow-io
and import it.
After some tossing, I found a solution (it may be the official recommended way):
Since v2.6.0, Tensorflow no longer provides HDFS, GCS and other file system support in the framework, but transfers these support to the Tensorflow/IO project.
Therefore, in future versions, to have the support of HDFS, GCS and other file systems, you only need to install tensorflow-io and import it to the training program:
$ pip install tensorflow-io
$ cat test.py
import tensorflow as tf
import tensorflow_io as tfio
print(tf.io.gfile.glob('hdfs://...'))
$ CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath --glob) python test.py
It will load libtensorflow_io.so and libtensorflow_io_plugins.so, which contains the implementation and registration logic of each extras file system:
# tensorflow_io/python/ops/__init__.py
core_ops = LazyLoader("core_ops", "libtensorflow_io.so")
try:
plugin_ops = _load_library("libtensorflow_io_plugins.so", "fs")
except NotImplementedError as e:
warnings.warn("unable to load libtensorflow_io_plugins.so: {}".format(e))
# Note: load libtensorflow_io.so imperatively in case of statically linking
try:
core_ops = _load_library("libtensorflow_io.so")
plugin_ops = _load_library("libtensorflow_io.so", "fs")
except NotImplementedError as e:
warnings.warn("file system plugins are not loaded: {}".format(e))
Ref:
Upvotes: 2