Ahmed Magdy
Ahmed Magdy

Reputation: 43

load .npy file from google cloud storage with tensorflow

i'm trying to load .npy files from my google cloud storage to my model i followed this example here Load numpy array in google-cloud-ml job but i get this error

'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

can you help me please ?? here is sample from the code

Here i read the file

with file_io.FileIO(metadata_filename, 'r') as f:
    self._metadata = [line.strip().split('|') for line in f]

and here i start processing on it

if self._offset >= len(self._metadata):
    self._offset = 0
    random.shuffle(self._metadata)
meta = self._metadata[self._offset]
self._offset += 1
text = meta[3]
    if self._cmudict and random.random() < _p_cmudict:
        text = ' '.join([self._maybe_get_arpabet(word) for word in text.split(' ')])

    input_data = np.asarray(text_to_sequence(text, self._cleaner_names), dtype=np.int32)
    f = StringIO(file_io.read_file_to_string(
        os.path.join('gs://path',meta[0]))
    linear_target = tf.Variable(initial_value=np.load(f), name='linear_target')
    s = StringIO(file_io.read_file_to_string(
        os.path.join('gs://path',meta[1])))
    mel_target = tf.Variable(initial_value=np.load(s), name='mel_target')
    return (input_data, mel_target, linear_target, len(linear_target))

and this is a sample from the data sample

Upvotes: 1

Views: 1426

Answers (1)

Nikhil Kothari
Nikhil Kothari

Reputation: 5225

This is likely because your file doesn't contain utf-8 encoded text.

Its possible, you may need to initialize the file_io.FileIO instance as a binary file using mode = 'rb', or set binary_mode = True in the call to read_file_to_string.

This will cause data that is read to be returned as a sequence of bytes, rather than a string.

Upvotes: 2

Related Questions