Reputation: 1640
I am getting the following error when I run mnist = input_data.read_data_sets("MNIST_data", one_hot = True)
.
EOFError: Compressed file ended before the end-of-stream marker was reached
Even when I extract the file manually and place it in the MNIST_data
directory, the program is still trying to download the file instead of using the extracted file.
When I extract the file using WinZip which is the manual way, WinZip tells me that the file is corrupt.
How do I solve this problem?
I can't even load the data set now, I still have to debug the program itself. Please help.
I pip installed Tensorflow and so I don't have a Tensorflow example. So I went to GitHub to get the input_data
file and saved it in the same directory as my main.py
. The error is just regarding the .gz file. The program could not extract it.
runfile('C:/Users/Nikhil/Desktop/Tensor Flow/tensf.py', wdir='C:/Users/Nikhil/Desktop/Tensor Flow') Reloaded modules: input_data Extracting MNIST_data/train-images-idx3-ubyte.gz C:\Users\Nikhil\Anaconda3\lib\gzip.py:274: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future return self._buffer.read(size) Traceback (most recent call last):
File "", line 1, in runfile('C:/Users/Nikhil/Desktop/Tensor Flow/tensf.py', wdir='C:/Users/Nikhil/Desktop/Tensor Flow')
File "C:\Users\Nikhil\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)
File "C:\Users\Nikhil\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Nikhil/Desktop/Tensor Flow/tensf.py", line 26, in mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)
File "C:\Users\Nikhil\Desktop\Tensor Flow\input_data.py", line 181, in read_data_sets train_images = extract_images(local_file)
File "C:\Users\Nikhil\Desktop\Tensor Flow\input_data.py", line 60, in extract_images buf = bytestream.read(rows * cols * num_images)
File "C:\Users\Nikhil\Anaconda3\lib\gzip.py", line 274, in read return self._buffer.read(size)
File "C:\Users\Nikhil\Anaconda3\lib_compression.py", line 68, in readinto data = self.read(len(byte_view))
File "C:\Users\Nikhil\Anaconda3\lib\gzip.py", line 480, in read raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
Upvotes: 21
Views: 88933
Reputation: 1113
I couldn't seem to find the Keras dataset download folder as mentioned in other answers in my Linux.
So, I found a kinda hacky but easy fix to this problem. Turns out there's a builtin way to force download
the files in the mnist library.
venv/lib/python3.10/site-packages/mnist/__init__.py
force=False
in this file and set them to force=True
Here's the file after the update:
import os
import functools
import operator
import gzip
import struct
import array
import tempfile
try:
from urllib.request import urlretrieve
except ImportError:
from urllib import urlretrieve # py2
try:
from urllib.parse import urljoin
except ImportError:
from urlparse import urljoin
import numpy
__version__ = '0.2.2'
# `datasets_url` and `temporary_dir` can be set by the user using:
# >>> mnist.datasets_url = 'http://my.mnist.url'
# >>> mnist.temporary_dir = lambda: '/tmp/mnist'
datasets_url = 'http://yann.lecun.com/exdb/mnist/'
temporary_dir = tempfile.gettempdir
class IdxDecodeError(ValueError):
"""Raised when an invalid idx file is parsed."""
pass
def download_file(fname, target_dir=None, force=True):
"""Download fname from the datasets_url, and save it to target_dir,
unless the file already exists, and force is False.
Parameters
----------
fname : str
Name of the file to download
target_dir : str
Directory where to store the file
force : bool
Force downloading the file, if it already exists
Returns
-------
fname : str
Full path of the downloaded file
"""
target_dir = target_dir or temporary_dir()
target_fname = os.path.join(target_dir, fname)
if force or not os.path.isfile(target_fname):
url = urljoin(datasets_url, fname)
urlretrieve(url, target_fname)
return target_fname
def parse_idx(fd):
"""Parse an IDX file, and return it as a numpy array.
Parameters
----------
fd : file
File descriptor of the IDX file to parse
endian : str
Byte order of the IDX file. See [1] for available options
Returns
-------
data : numpy.ndarray
Numpy array with the dimensions and the data in the IDX file
1. https://docs.python.org/3/library/struct.html
#byte-order-size-and-alignment
"""
DATA_TYPES = {0x08: 'B', # unsigned byte
0x09: 'b', # signed byte
0x0b: 'h', # short (2 bytes)
0x0c: 'i', # int (4 bytes)
0x0d: 'f', # float (4 bytes)
0x0e: 'd'} # double (8 bytes)
header = fd.read(4)
if len(header) != 4:
raise IdxDecodeError('Invalid IDX file, '
'file empty or does not contain a full header.')
zeros, data_type, num_dimensions = struct.unpack('>HBB', header)
if zeros != 0:
raise IdxDecodeError('Invalid IDX file, '
'file must start with two zero bytes. '
'Found 0x%02x' % zeros)
try:
data_type = DATA_TYPES[data_type]
except KeyError:
raise IdxDecodeError('Unknown data type '
'0x%02x in IDX file' % data_type)
dimension_sizes = struct.unpack('>' + 'I' * num_dimensions,
fd.read(4 * num_dimensions))
data = array.array(data_type, fd.read())
data.byteswap() # looks like array.array reads data as little endian
expected_items = functools.reduce(operator.mul, dimension_sizes)
if len(data) != expected_items:
raise IdxDecodeError('IDX file has wrong number of items. '
'Expected: %d. Found: %d' % (expected_items,
len(data)))
return numpy.array(data).reshape(dimension_sizes)
def download_and_parse_mnist_file(fname, target_dir=None, force=True):
"""Download the IDX file named fname from the URL specified in dataset_url
and return it as a numpy array.
Parameters
----------
fname : str
File name to download and parse
target_dir : str
Directory where to store the file
force : bool
Force downloading the file, if it already exists
Returns
-------
data : numpy.ndarray
Numpy array with the dimensions and the data in the IDX file
"""
fname = download_file(fname, target_dir=target_dir, force=force)
fopen = gzip.open if os.path.splitext(fname)[1] == '.gz' else open
with fopen(fname, 'rb') as fd:
return parse_idx(fd)
def train_images():
"""Return train images from Yann LeCun MNIST database as a numpy array.
Download the file, if not already found in the temporary directory of
the system.
Returns
-------
train_images : numpy.ndarray
Numpy array with the images in the train MNIST database. The first
dimension indexes each sample, while the other two index rows and
columns of the image
"""
return download_and_parse_mnist_file('train-images-idx3-ubyte.gz')
def test_images():
"""Return test images from Yann LeCun MNIST database as a numpy array.
Download the file, if not already found in the temporary directory of
the system.
Returns
-------
test_images : numpy.ndarray
Numpy array with the images in the train MNIST database. The first
dimension indexes each sample, while the other two index rows and
columns of the image
"""
return download_and_parse_mnist_file('t10k-images-idx3-ubyte.gz')
def train_labels():
"""Return train labels from Yann LeCun MNIST database as a numpy array.
Download the file, if not already found in the temporary directory of
the system.
Returns
-------
train_labels : numpy.ndarray
Numpy array with the labels 0 to 9 in the train MNIST database.
"""
return download_and_parse_mnist_file('train-labels-idx1-ubyte.gz')
def test_labels():
"""Return test labels from Yann LeCun MNIST database as a numpy array.
Download the file, if not already found in the temporary directory of
the system.
Returns
-------
test_labels : numpy.ndarray
Numpy array with the labels 0 to 9 in the train MNIST database.
"""
return download_and_parse_mnist_file('t10k-labels-idx1-ubyte.gz')
force=False
whenever you don't want to download them again (and don't face this silly issue xD) but these dataset takes like 1 sec to download anyway so, it shouldn't ever be a big issue.Upvotes: 1
Reputation: 1
I had the same issue first u have to download the dataset using below 2 lines of code I am using pycharm
'name'=tensorflow.keras.datasets.fashion_mnist
name.load_data()
run this first, it will download the data then u can load by using below
'name'=tensorflow.keras.datasets.fashion_mnist
(train_images,train_lables),(test_images,test_lables)=name.load_data()
[tag:load_data() error,compressed file ended before,fashion_nmist]
Upvotes: 0
Reputation: 131
It is very simple in windows :
Go to : C:\Users\Username\.keras\datasets
and then Delete the Dataset that you want to redownload or has the error
Upvotes: 5
Reputation: 155
I had the same issue when downloading datasets using torchvision on Windows. I was able to fix this by deleting all files from the following path: C:\Users\UserName\MNIST\raw
Upvotes: 1
Reputation: 3434
It happens when you download the datasets and due to some reasons it is not downloaded. Any one struggling in windows when working with pytorch. I have resolved the same issues by deleting the folder which resides in below path
C:/Users/UserName/.pytorch/foldername
Also check in your case .pytorch may not be visible due to disable of hidden file.
Upvotes: 1
Reputation: 41
At first, from the Keras directory remove the partially installed fashion_mnist
directory.
After that, download the files from GitHub
https://github.com/zalandoresearch/fashion-mnist/blob/master/data/fashion/train-labels-idx1-ubyte.gz
Place those files and the extracted files in the fashion_mnist
directory in the Keras folder.
This will solve your problem.
Upvotes: 1
Reputation: 373
To anyone struggling, I had a similar issue. On my Mac Mojave 10.14.3.
Taking a class on UDEMY using Anaconda and Jupyter used the following to fix the issue.
Finder > Go > Go to Folder > In go to folder window input ~/.keras/datasets/fashion_mnist > delete the partially downloaded files
Go to GitHub and search fashion-mnist-master from https://github.com/zalandoresearch/fashion-mnist.git
Download the file locate the data > fashion file and unzip the four files
Place the four unzipped files into the ~/.keras/datasets/fashion_mnist >
open Jupyter Lab in a new page insert the following:
from keras.datasets import fashion_mnist
#message states using TensorFlow backend
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
#it will then cycle through as if the download were successful
Good luck may the odds be in your favor.
Upvotes: 2
Reputation: 8319
If the download gets interrupted, delete the C:/tmp/imagenet
folder and restart the download.
Also, for people who get here via Google, run the classify_image.py
file via the command line instead of using IDLE:
python classify_image.py
Upvotes: 1
Reputation: 393
This is because for some reason you have an incomplete download for the MNIST dataset.
You will have to manually delete the downloaded folder which usually resides in ~/.keras/datasets
or any path specified by you relative to this path, in your case MNIST_data
.
Perform the following steps in the terminal (ctrl + alt + t):
cd ~/.keras/datasets/
rm -rf "dataset name"
You should be good to go!
Upvotes: 26