Reputation: 5563
I have been experimenting with a Keras example, which needs to import MNIST data
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
It generates error messages such as Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out
It should be related to the network environment I am using. Is there any function or code that can let me directly import the MNIST data set that has been manually downloaded?
I tried the following approach
import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
data = pickle.load(f)
else:
data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data
Then I get the following error message
Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)
Upvotes: 22
Views: 63652
Reputation: 47
Gogasca's answer worked for me with a little adjustment. For Python 3.9, changing the code in ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py so that it uses the path variable as full path instead of adding the origin_folder makes it possible to pass any local path to the downloaded file.
path = path
""" origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/' """
""" path = get_file(
path,origin=origin_folder + 'mnist.npz',file_hash='731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1') """
with np.load(path, allow_pickle=True) as f: # pylint:
disable=unexpected-keyword-arg
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
path = "/Users/username/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.npz"
(train_images, train_labels), (test_images, test_labels ) = mnist.load_data(path=path)```
Upvotes: 0
Reputation: 1016
keras.datasets.mnist.load_data()
will attempt to fetch from the remote repository even when a local file path is specified. However, the easiest workaround to load the downloaded file is to use numpy.load()
, just like they do:
path = '/tmp/data/mnist.npz'
import numpy as np
with np.load(path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
Upvotes: 1
Reputation: 1538
https://s3.amazonaws.com/img-datasets/mnist.npz
mnist.npz
to .keras/datasets/
directoryLoad data
import keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
Upvotes: 4
Reputation: 10058
Keras file is located into a new path in Google Cloud Storage (Before it was in AWS S3):
https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
When using:
tf.keras.datasets.mnist.load_data()
You can pass a path
parameter.
load_data()
will call get_file()
which takes as parameter fname
, if path is a full path and file exists, it will not be downloaded.
Example:
# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000
Upvotes: 12
Reputation: 1370
You do not need additional code for that but can tell load_data
to load a local version in the first place:
~/.keras/datasets/
(on Linux and macOS)load_data(path='mnist.npz')
with the right file nameUpvotes: 12
Reputation: 4647
Well, the keras.datasets.mnist
file is really short. You can manually simulate the same action, that is:
.
import gzip
f = gzip.open('mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
data = cPickle.load(f)
else:
data = cPickle.load(f, encoding='bytes')
f.close()
(x_train, _), (x_test, _) = data
Upvotes: 14