Reputation: 208
I have a google-cloud-ml job that requires loading numpy .npz files from gs bucket. I followed this example on how to load .npy files from gs, but it didn't work for me since .npz files are compressed.
Here's my code:
from StringIO import StringIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io
f = StringIO(file_io.read_file_to_string('gs://my-bucket/data.npz'))
data = np.load(f)
And here's the error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 10: invalid start byte
Apparently, encoding the data to str
is not correct, but I'm not sure how to address this.
Can some one help? Thanks!
Upvotes: 3
Views: 2823
Reputation: 208
An alternative is (note the difference between earlier TF versions and later ones):
import numpy as np
from tensorflow.python.lib.io import file_io
from tensorflow import __version__ as tf_version
if tf_version >= '1.1.0':
mode = 'rb'
else: # for TF version 1.0
mode = 'r'
f_stream = file_io.FileIO('mydata.npz', mode)
d = np.load( BytesIO(f_stream.read()) )
Similarly, for pickle files:
import pickle
d = pickle.load(file_io.FileIO('mydata.pickle', mode))
Upvotes: 1
Reputation: 8389
Try using io.BytesIO
instead, which has the added bonus of being forwards-compatible with Python 3:
import io
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io
f = io.BytesIO(file_io.read_file_to_string('gs://my-bucket/data.npz'),
binary_mode=True)
data = np.load(f)
Upvotes: 1
Reputation: 208
It turns out I need to set the binary flag to True
in file_io.read_file_to_string()
.
Here's the working code:
from io import BytesIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io
f = BytesIO(file_io.read_file_to_string('gs://my-bucket/data.npz', binary_mode=True))
data = np.load(f)
And this works for both compressed and uncompressed .npz files.
Upvotes: 5