Reputation: 1084
I have found a neural network for semantic segmentation purpose. The network works just fine, I feed my training, validation and test data and I get the output (segmented parts in different colors). Until here, all is OK. I am using Keras with Tensorflow 1.7.0, GPU enabled. Python version is 3.5
What I want to achieve though is to get access to the pixel groups (segments) so that I can get their boundaries' image coordinates, i.e. an array of points which forms the boundary of the segment X shown in green in the prediction image.
How to do that? Obviously I cannot put the entire code here but here is a snippet which I should modify to achieve what I would like to:
I have the following in my evaluate function:
def evaluate(model_file):
net = load_model(model_file, custom_objects={'iou_metric': create_iou_metric(1 + len(PART_NAMES)),
'acc_metric': create_accuracy_metric(1 + len(PART_NAMES), output_mode='pixelwise_mean')})
img_size = net.input_shape[1]
image_filename = lambda fp: fp + '.jpg'
d_test_x = TensorResize((img_size, img_size))(ImageSource(TEST_DATA, image_filename=image_filename))
d_test_x = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_test_x)
d_test_pred = Predict(net)(d_test_x)
d_test_pred.metadata['properties'] = ['background'] + PART_NAMES
d_x, d_y = process_data(VALIDATION_DATA, img_size)
d_x = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_x)
d_y = AddBackgroundMap(use_lane_names=['Y'])(d_y)
d_train = Join()([d_x, d_y])
print('losses:', net.evaluate_generator(d_train.batch_array_tuple_generator(batch_size=3), 3))
# the tensor which needs to be modified
pred_y = Predict(net)(d_x)
Visualize(('slices', 'labels'))(Join()([d_test_x, d_test_pred]))
Visualize(('slices', 'labels', 'labels'))(Join()([d_x, pred_y, d_y]))
As for the Predict function, here is the snippet:
Alternatively, I've found that by using the following, one can get access to the tensor:
# for sample_img, in d_x.batch_array_tuple_generator(batch_size=3, n_samples=5):
# aa = net.predict(sample_img)
# indexes = np.argmax(aa,axis=3)
# print(indexes)
# import pdb
# pdb.set_trace()
But I have no idea how this works, I've never used pdb, therefore no idea.
In case if anyone wants to also see the training function, here it is:
def train(model_name='refine_res', k=3, recompute=False, img_size=224,
epochs=10, train_decoder_only=False, augmentation_boost=2, learning_rate=0.001,
opt='rmsprop'):
print("Traning on: " + str(PART_NAMES))
print("In Total: " + str(1 + len(PART_NAMES)) + " parts.")
metrics = [create_iou_metric(1 + len(PART_NAMES)),
create_accuracy_metric(1 + len(PART_NAMES), output_mode='pixelwise_mean')]
if model_name == 'dummy':
net = build_dummy((224, 224, 3), 1 + len(PART_NAMES)) # 1+ because background class
elif model_name == 'refine_res':
net = build_resnet50_upconv_refine((img_size, img_size, 3), 1 + len(PART_NAMES), k=k, optimizer=opt, learning_rate=learning_rate, softmax_top=True,
objective_function=categorical_crossentropy,
metrics=metrics, train_full=not train_decoder_only)
elif model_name == 'vgg_upconv':
net = build_vgg_upconv((img_size, img_size, 3), 1 + len(PART_NAMES), k=k, optimizer=opt, learning_rate=learning_rate, softmax_top=True,
objective_function=categorical_crossentropy,metrics=metrics, train_full=not train_decoder_only)
else:
net = load_model(model_name)
d_x, d_y = process_data(TRAINING_DATA, img_size, recompute=recompute, ignore_cache=False)
d = Join()([d_x, d_y])
# create more samples by rotating top view images and translating
images_to_be_rotated = {}
factor = 5
for root, dirs, files in os.walk(TRAINING_DATA, topdown=False):
for name in dirs:
format = str(name + '/' + name) # construct the format of foldername/foldername
images_to_be_rotated.update({format: factor})
d_aug = ImageAugmentation(factor_per_filepath_prefix=images_to_be_rotated, rotation_variance=90, recalc_base_seed=True)(d)
d_aug = ImageAugmentation(factor=3 * augmentation_boost, color_interval=0.03, shift_interval=0.1, contrast=0.4, recalc_base_seed=True, use_lane_names=['X'])(d_aug)
d_aug = ImageAugmentation(factor=2, rotation_variance=20, recalc_base_seed=True)(d_aug)
d_aug = ImageAugmentation(factor=7 * augmentation_boost, rotation_variance=10, translation=35, mirror=True, recalc_base_seed=True)(d_aug)
# apply augmentation on the images of the training dataset only
d_aug = AddBackgroundMap(use_lane_names=['Y'])(d_aug)
d_aug.metadata['properties'] = ['background'] + PART_NAMES
# substract mean and shuffle
d_aug = Shuffle()(d_aug)
d_aug, d_val = RandomSplit(0.8)(d_aug)
d_aug = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_aug)
d_val = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_val)
# Visualize()(d_aug)
d_aug.configure()
d_val.configure()
print('training size:', d_aug.size())
batch_size = 4
callbacks = []
#callbacks += [EarlyStopping(patience=10)]
callbacks += [ModelCheckpoint(filepath="trained_models/"+model_name + '.hdf5', monitor='val_iou_metric', mode='max',
verbose=1, save_best_only=True)]
callbacks += [CSVLogger('logs/'+model_name + '.csv')]
history = History()
callbacks += [history]
# sess = K.get_session()
# sess.run(tf.initialize_local_variables())
net.fit_generator(d_aug.batch_array_tuple_generator(batch_size=batch_size, shuffle_samples=True), steps_per_epoch=d_aug.size() // batch_size,
validation_data=d_val.batch_array_tuple_generator(batch_size=batch_size), validation_steps=d_val.size() // batch_size,
callbacks=callbacks, epochs=epochs)
return {k: (max(history.history[k]), min(history.history[k])) for k in history.history.keys()}
Upvotes: 4
Views: 1080
Reputation: 2322
for segmentation tasks, considering that your batch is one image, each pixel in the image is assigned a probability to belong to a class. Suppose you have 5 classes, and the image has 784 pixels(28x28) , you will get from the net.predict
an array of shape (784,5)
each pixel among 784 is assigned 5 probabilities values to belong to those classes. when you do np.argmax(aa,axis=3)
you get the index of the highests probabilities for each pixel that would of shape (784,1)
you can then reshape it to 28x28 indexes.reshape(28,28)
and you get the mask of your predictions.
Reducing the problem to a 7x7 dimension and 4 classes(0-3) that looks like
array([[2, 1, 0, 1, 2, 3, 1],
[3, 1, 1, 0, 3, 0, 0],
[3, 3, 2, 2, 0, 3, 1],
[1, 1, 0, 3, 1, 3, 1],
[0, 0, 0, 3, 3, 1, 0],
[1, 2, 3, 0, 1, 2, 3],
[0, 2, 1, 1, 0, 1, 3]])
you want to extract the indexes where the model predicted 1
segment_1=np.where(indexes==1)
since its 2 dimension array, segment_1 will be 2x7 array,where the first array is the row indexes, and second array will be column value.
(array([0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 4, 5, 5, 6, 6, 6]), array([1, 3, 6, 1, 2, 6, 0, 1, 4, 6, 5, 0, 4, 2, 3, 5]))
looking at first number in the first and second array,0 and 1
point to where the located in indexes
You can extract its value like
indexes[segment_1]
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
and then proceed with second class you want to get ,lets say 2
segment_2=np.where(image==2)
segment_2
(array([0, 0, 2, 2, 5, 5, 6]), array([0, 4, 2, 3, 1, 5, 1]))
and if you want to get each classes itsself.
you can create a copy of indexes
for each class,4 copies in total class_1=indexes
and set to zero any value that is not equal to 1. class_1[class_1!=1]=0
and get something like this
array([[0, 1, 0, 1, 0, 0, 1],
[0, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1],
[1, 1, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 1, 0]])
for the eye, you may think that there are countour but from this example, you can tell that there is no clear contour of each segment. The only way i could think of,is to loop the image in rows and record where the value change and do the same in columns. I am not entired sure if this would be ideal situation. I hope i covered some part of your question. PDB is just a debugging package that allows you execute your code step by step
Upvotes: 3