Saksham
Saksham

Reputation: 65

How to load custom data into tfds for keras cyclegan example?

As per the example in https://keras.io/examples/generative/cyclegan/, a pre-existing dataset has been loaded for implementation. I am trying to add my dataset.

import tensorflow_datasets as tfds
data = tfds.folder_dataset.ImageFolder('Images', shape=(256, 256, 3))
ds = data.as_dataset()

where 'Images' is the root folder containing two subfolders train and test. train folder containing trainA and trainB , test containing testA and testB.

However, I am unable to understand on how to access trainA , trainB , testA and testB so that it gets accepted by keras cyclegan example.

Upvotes: 3

Views: 1026

Answers (3)

m.vobe
m.vobe

Reputation: 11

Cant write a comment yet but I think this may help some others: kosas Pipeline was working for me, I did optional renamings for my usecase. But I could't load the dataset with the current tensorflow example for cycleGAN (https://www.tensorflow.org/tutorials/generative/cyclegan) I used tfds.load("Soiled") and I got the errormessage, a 'label' was not found. I found a solution (TypeError: tf__normalize_img() missing 1 required positional argument: 'label') where it states that you have to use tfds.load("Soiled", as_supervised=True) as otherwise the data is loaded as a dictionary and not as a needed tulpe of (image, label) This addon worked for me.

Upvotes: 1

kosa
kosa

Reputation: 272

I curated/wrote the whole code here

https://github.com/asokraju/Soiled

and added a read me file with specific instructions on how-to. Hope this is helpful

Custom Tensorflow Input Pipeline for Cycle GANs

Steps to create the dataset

  1. Organize the data set inside a Data.zip file

    trainA
    trainB
    testA
    testB
    

    A and B represents the two classes.

  2. Provide the path ( of the Data.zip file ) in line 28 of Soiled.py i.e.,

    _DL_URLS = Soiled":"C:\\Users\\<user>\\Downloads\\Data_001.zip"}
    
  3. cd into Soiled folder and use tfds build command to build the data

  4. The Tensorflow record files can be found at C:\Users\<user>\tensorflow_datasets\soiled. If needed, these files can be taken elsewhere to use.

loading the data

There are multiple ways to do it.

  1. Import the necessary packages:
    import tensorflow as tf
    import tensorflow_datasets as tfds
    import sys
    
  2. Ensure that the path to Soiled folder containg the code, NOT the data generated, is accessable to the code. For this I have added the path as follows:
    sys.path.insert(1, 'C:\\Users\\<user>\\Downloads\\')
    
  3. Then the data can be loaded using:
    ds = tfds.load('Soiled')
    ds
    
    {'trainA': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
    'trainB': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
    'testA': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>,
    'testB': <PrefetchDataset shapes: {image: (None, None, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>}
    
  4. test:
    next(iter(ds['trainA']))
    
    Output exceeds the size limit. Open the full output data in a text editor
    {'image': <tf.Tensor: shape=(1200, 1920, 3), dtype=uint8, numpy=
    array([[[255, 255, 255],
            [255, 255, 255],
            [255, 255, 255],
            ...,
            [115, 173, 187],
            [112, 174, 197],
            [108, 172, 199]],
    
            [[255, 255, 255],
            [255, 255, 255],
            [255, 255, 255],
            ...,
            [119, 170, 191],
            [115, 165, 192],
            [117, 168, 197]],
    
            [[255, 255, 255],
            [255, 255, 255],
            [255, 255, 255],
            ...,
            [109, 145, 179],
            [134, 162, 199],
            [134, 158, 194]],
    
    ...
            ...,
            [ 72,  95,  67],
            [ 78,  99,  66],
            [ 79,  99,  62]]], dtype=uint8)>,
    'label': <tf.Tensor: shape=(), dtype=int64, numpy=0>}
    

Steps used to create the folder structure.

  1. Install tensorflow_datasets package
  2. On Command line type tfds new Soiled. This will create a Soiled folder with file structure
    checksums.tsv
    dummy_data/
    Soiled.py
    Soiled_test.py
    
  3. edit Soiled.py as needed.

Possible issues:

  1. If it fails to build the pipeline, delete the folder tesorflow_datasets folder BEFORE you retry. In windows it can found at C\users\<user>.
  2. If it gives an error something similar to
    # tensorflow.python.framework.errors_impl.NotFoundError: Could not find directory C:\Users\<user>\tensorflow_datasets\downloads\extracted\ZIP.Users_kkosara_Downloads_Data_18r38_Co4F-G6ka9wRk2wGFbDPqLZu8TekEV7s9L9enI.zip\testA\trainA
    
    try changing the data_dirs in lines to path_to_dataset or something that ensures it has the correct path to the downloaded data.
  3. Ensure that the folder structure is proper
        1. Organize the data set inside a `Data.zip` file 
        trainA
        trainB
        testA
        testB
        A and B represents the two classes.
    
    also ensure that there are nothing else except the image files inside the folder.

Used Resources

  1. How to load custom data into tfds for keras cyclegan example?
  2. https://www.tensorflow.org/datasets/cli
  3. https://www.tensorflow.org/datasets/catalog/cycle_gan
  4. https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/generative/cyclegan.ipynb#scrollTo=Ds4o1h4WHz9U

Upvotes: -2

mrk
mrk

Reputation: 10366

Best practice is to write your own tensorflow dataset

you can do so with the TFDS CLI (command line interface).

  1. Install the TFDS CLI: pip install -q tfds-nightly
  2. Navigate into the directory of your dataset: cd path/to/my/project/datasets/
  3. Create a new dataset: tfds new my_dataset
  4. [...] Manually modify my_dataset/my_dataset.py to implement your dataset.
  5. Navigate into your new dataset: cd my_dataset/
  6. Build your new TFDS dataset: tfds build

Within your project you then need to import your dataset

import my.project.datasets.my_dataset 

and access it as you would any other tfds dataset:

ds = tfds.load('my_dataset')

Tensorflow documentation for adding a dataset is to be found here.

Upvotes: 2

Related Questions