Reputation: 834
I am creating a multitask CNN model and I have two different classification properties (one with 10 classes, 2nd with 5 classes) and my directory structure looks like this:
-Train - image1.jpg ... - imageN.jpg -Test - image1.jpg ... - imageN.jpg -Vald - image1.jpg ... - imageN.jpg
trainlabel is a dataframe containing, Image, PFRType, FuelType columns
I am trying to use flow_from_dataframe and my generators are:
trainGen = ImageDataGenerator()
trainGenDf = trainGen.flow_from_dataframe(trainLabel,
directory = '../MTLData/train/',
x_col = "Image",y_col=["PFRType","FuelType"],
class_mode='multi_ouput',
target_size=(224,224),
batch_size=32)
The error I get is: Error when checking target: expected PFR to have shape (10,) but got array with shape (1,)
PFR is a subtask layer with 10 classes output
Upvotes: 1
Views: 1449
Reputation: 834
I have used custom function for generator, this doesnt support shuffle so far!
def get_data_generator(data, split ,batch_size=16):
imagePath = ''
df =''
if split == 'train':
imagePath = '../MTLData/train/'
df = data[data.dir == 'train']
elif split == 'test':
imagePath = '../MTLData/test/'
df = data[data.dir == 'test']
elif split == 'vald':
imagePath = '../MTLData/vald/'
df = data[data.dir == 'vald']
pfrID = len(data.PFRType.unique())
ftID = len(data.FuelType.unique())
images, pfrs,fts = [], [], []
while True:
for i in range(0,df.shape[0]):
r = df.iloc[i]
file, pfr, ft = r['Image'], r['PFRType'], r['FuelType']
im = Image.open(imagePath+file)
im = im.resize((224, 224))
im = np.array(im) / 255.0
images.append(im)
pfrs.append(to_categorical(pfr, pfrID))
fts.append(to_categorical(ft, ftID))
if len(images) >= batch_size:
yield np.array(images), [np.array(pfrs), np.array(fts)]
images, pfrs, fts = [], [], []
Upvotes: 0
Reputation: 1081
You can use flow_from_dataframe
.
You just need to parse your csv files containing the labels into a pandas dataframe which maps the filenames to their corresponding labels.
For instance, if dataframe looks like:
| image_path | label_task_a | label_task_b | subset |
|------------|--------------|--------------|--------|
| image1.jpg | foo | bla | Train |
| ... | ... | ... | ... |
| imageN.jpg | baz | whatever | Vald |
You can create one generator for each subset:
train_generator_task_a = datagen.flow_from_dataframe(
dataframe=df[df.subset == 'Train']],
directory='data/Train',
x_col='image_path',
y_col=['label_task_a', 'label_task_b'], # outputs for both tasks.
batch_size=32,
seed=42,
shuffle=True,
class_mode='categorical')
Regarding your Error: if you set class_mode='sparse'
, Keras expects the labels to be 1D numpy arrays of integer labels. Have you tried to set it to class_mode='multi_output'
?
Upvotes: 2