Reputation: 65
As per the example in https://keras.io/examples/generative/cyclegan/, a pre-existing dataset has been loaded for implementation. I am trying to add my dataset.
import tensorflow_datasets as tfds
data = tfds.folder_dataset.ImageFolder('Images', shape=(256, 256, 3))
ds = data.as_dataset()
where 'Images' is the root folder containing two subfolders train and test. train folder containing trainA and trainB , test containing testA and testB.
However, I am unable to understand on how to access trainA , trainB , testA and testB so that it gets accepted by keras cyclegan example.
Upvotes: 3
Views: 1026
Reputation: 11
Cant write a comment yet but I think this may help some others: kosas Pipeline was working for me, I did optional renamings for my usecase. But I could't load the dataset with the current tensorflow example for cycleGAN (https://www.tensorflow.org/tutorials/generative/cyclegan)
I used
tfds.load("Soiled")
and I got the errormessage, a 'label' was not found. I found a solution (TypeError: tf__normalize_img() missing 1 required positional argument: 'label') where it states that you have to use
tfds.load("Soiled", as_supervised=True)
as otherwise the data is loaded as a dictionary and not as a needed tulpe of (image, label)
This addon worked for me.
Upvotes: 1
Reputation: 272
I curated/wrote the whole code here
https://github.com/asokraju/Soiled
and added a read me file with specific instructions on how-to. Hope this is helpful
Organize the data set inside a Data.zip
file
trainA
trainB
testA
testB
A
and B
represents the two classes.
Provide the path
( of the Data.zip
file ) in line 28
of Soiled.py
i.e.,
_DL_URLS = Soiled":"C:\\Users\\<user>\\Downloads\\Data_001.zip"}
cd
into Soiled
folder and use tfds build
command to build the data
The Tensorflow record files can be found at C:\Users\<user>\tensorflow_datasets\soiled
. If needed, these files can be taken elsewhere to use.
There are multiple ways to do it.
import tensorflow as tf
import tensorflow_datasets as tfds
import sys
Soiled
folder containg the code, NOT the data generated, is accessable to the code. For this I have added the path as follows:
sys.path.insert(1, 'C:\\Users\\<user>\\Downloads\\')
ds = tfds.load('Soiled')
ds
{'trainA': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
'trainB': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
'testA': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
'testB': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>}
next(iter(ds['trainA']))
Output exceeds the size limit. Open the full output data in a text editor
{'image': <tf.Tensor: shape=(1200, 1920, 3), dtype=uint8, numpy=
array([[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[115, 173, 187],
[112, 174, 197],
[108, 172, 199]],
[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[119, 170, 191],
[115, 165, 192],
[117, 168, 197]],
[[255, 255, 255],
[255, 255, 255],
[255, 255, 255],
...,
[109, 145, 179],
[134, 162, 199],
[134, 158, 194]],
...
...,
[ 72, 95, 67],
[ 78, 99, 66],
[ 79, 99, 62]]], dtype=uint8)>,
'label': <tf.Tensor: shape=(), dtype=int64, numpy=0>}
tensorflow_datasets
packagetfds new Soiled
. This will create a Soiled
folder with file structure
checksums.tsv
dummy_data/
Soiled.py
Soiled_test.py
Soiled.py
as needed.tesorflow_datasets
folder BEFORE you retry. In windows it can found at C\users\<user>
.# tensorflow.python.framework.errors_impl.NotFoundError: Could not find directory C:\Users\<user>\tensorflow_datasets\downloads\extracted\ZIP.Users_kkosara_Downloads_Data_18r38_Co4F-G6ka9wRk2wGFbDPqLZu8TekEV7s9L9enI.zip\testA\trainA
try changing the data_dirs
in lines to path_to_dataset
or something that ensures it has the correct path to the downloaded data. 1. Organize the data set inside a `Data.zip` file
trainA
trainB
testA
testB
A and B represents the two classes.
also ensure that there are nothing else except the image files inside the folder.Upvotes: -2
Reputation: 10366
Best practice is to write your own tensorflow dataset
you can do so with the TFDS CLI (command line interface).
pip install -q tfds-nightly
cd path/to/my/project/datasets/
tfds new my_dataset
my_dataset/my_dataset.py
to implement your dataset.cd my_dataset/
tfds build
Within your project you then need to import your dataset
import my.project.datasets.my_dataset
and access it as you would any other tfds dataset:
ds = tfds.load('my_dataset')
Tensorflow documentation for adding a dataset is to be found here.
Upvotes: 2