astromz
astromz

Reputation: 208

How to load numpy npz files in google-cloud-ml jobs or from Google Cloud Storage?

I have a google-cloud-ml job that requires loading numpy .npz files from gs bucket. I followed this example on how to load .npy files from gs, but it didn't work for me since .npz files are compressed.

Here's my code:

from StringIO import StringIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

f = StringIO(file_io.read_file_to_string('gs://my-bucket/data.npz'))
data = np.load(f)

And here's the error message:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 10: invalid start byte

Apparently, encoding the data to str is not correct, but I'm not sure how to address this.

Can some one help? Thanks!

Upvotes: 3

Views: 2823

Answers (3)

astromz
astromz

Reputation: 208

An alternative is (note the difference between earlier TF versions and later ones):

import numpy as np
from tensorflow.python.lib.io import file_io
from tensorflow import __version__ as tf_version

if tf_version >= '1.1.0':
    mode = 'rb'
else: # for TF version 1.0
    mode = 'r'

f_stream = file_io.FileIO('mydata.npz', mode)
d = np.load( BytesIO(f_stream.read()) )

Similarly, for pickle files:

import pickle
d = pickle.load(file_io.FileIO('mydata.pickle', mode))

Upvotes: 1

rhaertel80
rhaertel80

Reputation: 8389

Try using io.BytesIO instead, which has the added bonus of being forwards-compatible with Python 3:

import io
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

f = io.BytesIO(file_io.read_file_to_string('gs://my-bucket/data.npz'),
               binary_mode=True)
data = np.load(f)

Upvotes: 1

astromz
astromz

Reputation: 208

It turns out I need to set the binary flag to True in file_io.read_file_to_string().

Here's the working code:

from io import BytesIO
import tensorflow as tf
import numpy as np
from tensorflow.python.lib.io import file_io

f = BytesIO(file_io.read_file_to_string('gs://my-bucket/data.npz', binary_mode=True))
data = np.load(f)

And this works for both compressed and uncompressed .npz files.

Upvotes: 5

Related Questions