Reputation: 111
I am working on this kaggle dataset from 'APTOS 2019 Blindness Detection' and the dataset is inside a zip file. I want to preprocess the dataset to feed into a deep learning model.
My code looks like this:
train_dir = '../input/train_images'
train_labels = pd.read_csv('../input/train.csv')
train_labels['diagnosis'] = train_labels['diagnosis'].astype(str)
test_dir = '../input/test_images'
then to preprocess I wrote:
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
rescale=1./255,)
test_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_dataframe(
train_labels[:3295],
directory=train_dir,
x_col='id_code', y_col='diagnosis',
target_size=(150, 150),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,)
validation_generator = test_datagen.flow_from_dataframe(
train_labels[3295:],
directory=train_dir,
x_col='id_code', y_col='diagnosis',
target_size=(150, 150),
color_mode='rgb',
class_mode='categorical',
batch_size=32,
shuffle=True,)
But when I run the code. I get the results saying:
Found 0 validated image filenames belonging to 0 classes.
Found 0 validated image filenames belonging to 0 classes.
I have also tried unzipping the files but it wont unzip saying
FileNotFoundError: [Errno 2] No such file or directory: 'train_images.zip'
# importing required modules
from zipfile import ZipFile
# specifying the zip file name
file_name = "../input/train_images.zip"
# opening the zip file in READ mode
with ZipFile(file_name, 'r') as zip:
# extracting all the files
print('Extracting all the files now...')
zip.extractall()
So can someone help me fix this problem? I will appreciate it
Upvotes: 1
Views: 3757
Reputation: 106
I got stuck with this on kaggle today! It was first time I looked at dataset that was archived.
Now I know people said oh just do listdir('../input/') and you will see them! Or look at '../input/train_images/' But all I found were the zip files and the CSVs!
So what I did was extract the zipped training and testing datasets to the kaggle working directory.
So this was for aerial-cactus-detection. The input directory looks like /input/aerial-cactus-detection/ and has train.zip, test.zip, and train.csv (filenames + classes).
I went ahead and
import zipfile
Dataset = "train"
with zipfile.ZipFile("../input/aerial-cactus-identification/"+Dataset+".zip","r") as z:
z.extractall(".")
print(os.listdir("../working/"))
And yup it is extracted to working directory. And the same thing for test.zip:
Dataset = "test"
with zipfile.ZipFile("../input/aerial-cactus-identification/"+Dataset+".zip","r") as z:
z.extractall(".")
print(os.listdir("../working/"))
I read the CSVs earlier:
traindf=pd.read_csv('../input/aerial-cactus-identification/train.csv',dtype=str)
testdf=pd.read_csv('../input/aerial-cactus-identification/sample_submission.csv',dtype=str)
So I just go use flow_from_dataframe after extracting the archives:
train_generator=datagen.flow_from_dataframe(
dataframe=traindf,
directory="../working/train/",
x_col="id",
y_col="has_cactus",
subset="training",
batch_size=32,
seed=42,
shuffle=True,
class_mode="binary",
target_size=(150,150))
My notebook for it is public and is here
Upvotes: 0
Reputation: 1252
You have the images already unzipped in the directory ../home/train_images
Run this in your kernel:
from os import listdir
listdir('../input/train_images/')
Use ImageDataGenerator.flow_from_directory()
to use the images in the directory with your generator.
Check Keras docs: https://keras.io/preprocessing/image/#imagedatagenerator-methods
Upvotes: 1