Reputation: 1859
I have two CSV files, "train.csv" and "test.csv", which looks something like this
Image_ID | Target |
---|---|
ID_7xd1 | 0 |
ID_8xk1 | 1 |
This is the example of train.csv, in test.csv I have just the Image_ID
column and the goal is to predict its target with the images provided. The images folder are as follows
Images
├── test
│ ├── ID_12ls.tif
│ └── ID_1sfk.tif
│ └── ...
└── train
├── 0
│ ├── ID_7xd1.tif
│ └── ID_9xd0.tif
│ └──...
└── 1
├── ID_0xkd0.tif
└── ID_8xdk1.tif
└── ...
Each Image_ID in train.csv and test.csv represent an image and is tracked by the name of the image itself. Since I had lots of images so I decided to use Keras ImageDataGenerator.flow_from_directories
# data generators
datagen_train = ImageDataGenerator(rescale=1./255, validation_split=0.2, )
datagen_test = ImageDataGenerator(rescale=1./255)
# load and iterate training dataset
train_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='training')
# load and iterate validation dataset
val_it = datagen_train.flow_from_directory('train/', target_size= (224, 224), class_mode='binary', batch_size=64, seed=0, subset='validation')
# load and iterate test dataset
test_it = datagen_test.flow_from_directory('test/', target_size = (224, 224), class_mode=None, batch_size=1, seed=0)
model
model2 = Sequential()
model2.add(Conv2D(32,3,padding="valid", activation="relu", input_shape=(224,224,3)))
model2.add(MaxPool2D())
model2.add(Dropout(0.4))
model2.add(Flatten())
model2.add(Dense(128,activation="relu"))
model2.add(Dense(1, activation="sigmoid"))
opt = tf.keras.optimizers.Adam(lr=0.000001)
model2.compile(optimizer = opt , loss = 'binary_crossentropy' , metrics = ['accuracy'])
# callbacks
mc_loss = ModelCheckpoint('model2svd.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)
history2 = model2.fit_generator(generator=train_it, steps_per_epoch=step_size_t, validation_data= val_it, validation_steps=step_size_v,
epochs=100, shuffle=True, callbacks=[mc_loss])
Now after training the model with model.fit_generator()
, I made prediction on the testing dataset with model.predict_generator()
. It gave me array of 1,m
where m is total examples.
The problem is how do I map this output with my test.csv Image_ID. Or is the output is in the order of test.csv's Image_ID.
please let me know me if you need more details
Upvotes: 0
Views: 580
Reputation: 8102
in your test generator set shuffle=False. Also model.predict_generator is depreciated so just use model.predict. Now with shuffle=False in test generaotr you can get the sequence of predicted image files in the order they were processed as
test_files=test_it.filenames
to ensure you go through the test set samples EXACTLY once determine the test batch size and test steps such that test_batch_size X test_steps= number of test samples using the code below:
length=len(test_files)
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, ' test steps: ', test_steps)
then do
preds=model.predict(test_it, batch_size=test_batch_size, steps=test_steps)
then iterate through the preds
labels=[]
for p in preds:
if p > .5:
label=1
else:
label=0
labels.append(label)
Fseries=pd.Series(test_files, name='Image Id')
Lseries=pd.Series(labels, name='Target')
predictions_df= pd.concat([Fseries, Lseries], axis=1)
print (predictions_df.head())
Upvotes: 1