Obiii
Obiii

Reputation: 834

ImageDataGenerator for multi task output in Keras using flow_from_directory

I am creating a multitask CNN model and I have two different classification properties (one with 10 classes, 2nd with 5 classes) and my directory structure looks like this:

    -Train
       - image1.jpg
          ...
       - imageN.jpg
   
    -Test
       - image1.jpg
             ...
       - imageN.jpg

    -Vald
       - image1.jpg
          ...
       - imageN.jpg

trainlabel is a dataframe containing, Image, PFRType, FuelType columns

I am trying to use flow_from_dataframe and my generators are:

trainGen = ImageDataGenerator()
trainGenDf = trainGen.flow_from_dataframe(trainLabel,
                                         directory = '../MTLData/train/',
                                         x_col = "Image",y_col=["PFRType","FuelType"],
                                         class_mode='multi_ouput',
                                         target_size=(224,224),
                                         batch_size=32)

The error I get is: Error when checking target: expected PFR to have shape (10,) but got array with shape (1,)

PFR is a subtask layer with 10 classes output

Here is model diagram. enter image description here

Upvotes: 1

Views: 1449

Answers (2)

Obiii
Obiii

Reputation: 834

I have used custom function for generator, this doesnt support shuffle so far!

def get_data_generator(data, split ,batch_size=16):
        imagePath = ''
        df =''

        if split == 'train':
            imagePath = '../MTLData/train/'
            df = data[data.dir == 'train']
        elif split == 'test':
            imagePath = '../MTLData/test/'
            df = data[data.dir == 'test']
        elif split == 'vald':
            imagePath = '../MTLData/vald/'
            df = data[data.dir == 'vald']

        pfrID = len(data.PFRType.unique())
        ftID = len(data.FuelType.unique())
        images, pfrs,fts = [], [], []
        while True:
            for i in range(0,df.shape[0]):
                r = df.iloc[i]
                file, pfr, ft = r['Image'], r['PFRType'], r['FuelType']
                im = Image.open(imagePath+file)
                im = im.resize((224, 224))
                im = np.array(im) / 255.0
                images.append(im)
                pfrs.append(to_categorical(pfr, pfrID))
                fts.append(to_categorical(ft, ftID))
                if len(images) >= batch_size:
                    yield np.array(images), [np.array(pfrs), np.array(fts)]
                    images, pfrs, fts = [], [], []

Upvotes: 0

ITiger
ITiger

Reputation: 1081

You can use flow_from_dataframe. You just need to parse your csv files containing the labels into a pandas dataframe which maps the filenames to their corresponding labels.

For instance, if dataframe looks like:

| image_path | label_task_a | label_task_b | subset |
|------------|--------------|--------------|--------|
| image1.jpg | foo          | bla          | Train  |
| ...        | ...          | ...          | ...    |
| imageN.jpg | baz          | whatever     | Vald   |

You can create one generator for each subset:

train_generator_task_a = datagen.flow_from_dataframe(
  dataframe=df[df.subset == 'Train']],
  directory='data/Train',
  x_col='image_path',
  y_col=['label_task_a', 'label_task_b'], # outputs for both tasks.
  batch_size=32,
  seed=42,
  shuffle=True,
  class_mode='categorical')

Edit 1:

Regarding your Error: if you set class_mode='sparse', Keras expects the labels to be 1D numpy arrays of integer labels. Have you tried to set it to class_mode='multi_output'?

Upvotes: 2

Related Questions