Aigbomian VII
Aigbomian VII

Reputation: 112

How do i load images dataset using tf.keras.utils.get_file

I;m working with cifar-10 dataset and i need the dataset available publicly, so i pushed it to gitlab. i want to load this dataset in my code, after some digging i found an example where they used tf.keras.utils.get_file() which looked perfect but when i try to load my dataset i get a NotADirectoryError. but it loads just fine with the example i found online which is confusing, can someone please explain why it wouldn't work for my dataset?

here's the example i found that works, the is_dir() returns true

import pathlib

data_root_orig = tf.keras.utils.get_file(
'flower_photos','https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',untar=True)
    data_root = pathlib.Path(data_root_orig)
    print(data_root.is_dir()
)

here's my dataset I'm trying to load. Initially throws train_data is not a directory, when i try again it seems to work but is_dir is false and i'm unable to get to the files in my dataset

import pathlib
import tensorflow as tf
data_root_orig = tf.keras.utils.get_file('train',
                                         'https://gitlab.com/StephenAI/osato-file/raw/master/train.zip',
                                        untar=True, archive_format='zip')
data_root = pathlib.Path(data_root_orig)
print(data_root, type(data_root),data_root.is_dir())

Upvotes: 3

Views: 18998

Answers (3)

MichalSzczep
MichalSzczep

Reputation: 365

import tensorflow as tf
import pathlib
url = 'https://.zip'
data_dir = tf.keras.utils.get_file('dataset', url, extract=True)
# if url = 'https://tgz' => untar=True
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*.jpg')))

for tensorflow 2 you may find dataset directly in ~/.keras/datasets and use it as you want to

doc tf.keras.utils.get_file

Upvotes: 1

Ridouane Hadj Aissa
Ridouane Hadj Aissa

Reputation: 21

I had the same problem, and I had to take other a slightly different path, you can do as I did and see if it serves you well.. So I uploaded the .zip file in my Google Drive account, mounted it to Colab, and then i used patoolib.extract_archive(zip_file_path, outdir='destination_folder') and continued coding using the images from the destination_folder .. of course you're gonna need to install the library using !pip install patool and then import it using import patoolib.

Upvotes: 0

Frank Xu
Frank Xu

Reputation: 81

# download IMDb movie review dataset
import tensorflow as tf
dataset = tf.keras.utils.get_file(
    fname="aclImdb.tar.gz", 
    origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz", 
    extract=True,
)

//reference: https://github.com/amaiya/ktrain

Upvotes: 1

Related Questions