Tensorflow: How to retrieve information from the prediction Tensor?

Question

I have found a neural network for semantic segmentation purpose. The network works just fine, I feed my training, validation and test data and I get the output (segmented parts in different colors). Until here, all is OK. I am using Keras with Tensorflow 1.7.0, GPU enabled. Python version is 3.5

What I want to achieve though is to get access to the pixel groups (segments) so that I can get their boundaries' image coordinates, i.e. an array of points which forms the boundary of the segment X shown in green in the prediction image.

How to do that? Obviously I cannot put the entire code here but here is a snippet which I should modify to achieve what I would like to:

I have the following in my evaluate function:

    def evaluate(model_file):
    net = load_model(model_file, custom_objects={'iou_metric': create_iou_metric(1 + len(PART_NAMES)),
                                                 'acc_metric': create_accuracy_metric(1 + len(PART_NAMES), output_mode='pixelwise_mean')})

    img_size = net.input_shape[1]
    image_filename = lambda fp: fp + '.jpg'
    d_test_x = TensorResize((img_size, img_size))(ImageSource(TEST_DATA, image_filename=image_filename))
    d_test_x = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_test_x)
    d_test_pred = Predict(net)(d_test_x)
    d_test_pred.metadata['properties'] = ['background'] + PART_NAMES

    d_x, d_y = process_data(VALIDATION_DATA, img_size)
    d_x = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_x)
    d_y = AddBackgroundMap(use_lane_names=['Y'])(d_y)

    d_train = Join()([d_x, d_y])
    print('losses:', net.evaluate_generator(d_train.batch_array_tuple_generator(batch_size=3), 3))

    # the tensor which needs to be modified
    pred_y = Predict(net)(d_x)
    Visualize(('slices', 'labels'))(Join()([d_test_x, d_test_pred]))
    Visualize(('slices', 'labels', 'labels'))(Join()([d_x, pred_y, d_y]))

As for the Predict function, here is the snippet:

Alternatively, I've found that by using the following, one can get access to the tensor:

#    for sample_img, in d_x.batch_array_tuple_generator(batch_size=3, n_samples=5):
#        aa = net.predict(sample_img)
#        indexes = np.argmax(aa,axis=3)
#        print(indexes)
#        import pdb
#        pdb.set_trace()

But I have no idea how this works, I've never used pdb, therefore no idea.

In case if anyone wants to also see the training function, here it is:

def train(model_name='refine_res', k=3, recompute=False, img_size=224,
        epochs=10, train_decoder_only=False, augmentation_boost=2, learning_rate=0.001,
        opt='rmsprop'):

    print("Traning on: " + str(PART_NAMES))
    print("In Total: " + str(1 + len(PART_NAMES)) + " parts.")

    metrics = [create_iou_metric(1 + len(PART_NAMES)),
               create_accuracy_metric(1 + len(PART_NAMES), output_mode='pixelwise_mean')]

    if model_name == 'dummy':
        net = build_dummy((224, 224, 3), 1 + len(PART_NAMES))  # 1+ because background class
    elif model_name == 'refine_res':
        net = build_resnet50_upconv_refine((img_size, img_size, 3), 1 + len(PART_NAMES), k=k, optimizer=opt, learning_rate=learning_rate, softmax_top=True,
                                           objective_function=categorical_crossentropy,
                                           metrics=metrics, train_full=not train_decoder_only)
    elif model_name == 'vgg_upconv':
        net = build_vgg_upconv((img_size, img_size, 3), 1 + len(PART_NAMES), k=k, optimizer=opt, learning_rate=learning_rate, softmax_top=True,
                               objective_function=categorical_crossentropy,metrics=metrics, train_full=not train_decoder_only)
    else:
        net = load_model(model_name)

    d_x, d_y = process_data(TRAINING_DATA, img_size, recompute=recompute, ignore_cache=False)
    d = Join()([d_x, d_y])

    # create more samples by rotating top view images and translating
    images_to_be_rotated = {}
    factor = 5
    for root, dirs, files in os.walk(TRAINING_DATA, topdown=False):
        for name in dirs:
            format = str(name + '/' + name)  # construct the format of foldername/foldername
            images_to_be_rotated.update({format: factor})

    d_aug = ImageAugmentation(factor_per_filepath_prefix=images_to_be_rotated, rotation_variance=90, recalc_base_seed=True)(d)
    d_aug = ImageAugmentation(factor=3 * augmentation_boost, color_interval=0.03, shift_interval=0.1, contrast=0.4,  recalc_base_seed=True, use_lane_names=['X'])(d_aug)
    d_aug = ImageAugmentation(factor=2, rotation_variance=20, recalc_base_seed=True)(d_aug)
    d_aug = ImageAugmentation(factor=7 * augmentation_boost, rotation_variance=10, translation=35, mirror=True, recalc_base_seed=True)(d_aug)

    # apply augmentation on the images of the training dataset only

    d_aug = AddBackgroundMap(use_lane_names=['Y'])(d_aug)
    d_aug.metadata['properties'] = ['background'] + PART_NAMES

    # substract mean and shuffle
    d_aug = Shuffle()(d_aug)
    d_aug, d_val = RandomSplit(0.8)(d_aug)
    d_aug = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_aug)
    d_val = PixelwiseSubstract([103.93, 116.78, 123.68], use_lane_names=['X'])(d_val)

    # Visualize()(d_aug)

    d_aug.configure()
    d_val.configure()
    print('training size:', d_aug.size())
    batch_size = 4

    callbacks = []
    #callbacks += [EarlyStopping(patience=10)]
    callbacks += [ModelCheckpoint(filepath="trained_models/"+model_name + '.hdf5', monitor='val_iou_metric', mode='max',
                                  verbose=1, save_best_only=True)]
    callbacks += [CSVLogger('logs/'+model_name + '.csv')]
    history = History()
    callbacks += [history]

    # sess = K.get_session()
    # sess.run(tf.initialize_local_variables())

    net.fit_generator(d_aug.batch_array_tuple_generator(batch_size=batch_size, shuffle_samples=True), steps_per_epoch=d_aug.size() // batch_size,
                      validation_data=d_val.batch_array_tuple_generator(batch_size=batch_size), validation_steps=d_val.size() // batch_size,
                      callbacks=callbacks, epochs=epochs)

    return {k: (max(history.history[k]), min(history.history[k])) for k in history.history.keys()}

Eliethesaiyan · Accepted Answer

for segmentation tasks, considering that your batch is one image, each pixel in the image is assigned a probability to belong to a class. Suppose you have 5 classes, and the image has 784 pixels(28x28) , you will get from the net.predict an array of shape (784,5) each pixel among 784 is assigned 5 probabilities values to belong to those classes. when you do np.argmax(aa,axis=3) you get the index of the highests probabilities for each pixel that would of shape (784,1) you can then reshape it to 28x28 indexes.reshape(28,28) and you get the mask of your predictions.

Reducing the problem to a 7x7 dimension and 4 classes(0-3) that looks like

array([[2, 1, 0, 1, 2, 3, 1],
   [3, 1, 1, 0, 3, 0, 0],
   [3, 3, 2, 2, 0, 3, 1],
   [1, 1, 0, 3, 1, 3, 1],
   [0, 0, 0, 3, 3, 1, 0],
   [1, 2, 3, 0, 1, 2, 3],
   [0, 2, 1, 1, 0, 1, 3]])

you want to extract the indexes where the model predicted 1

segment_1=np.where(indexes==1)

since its 2 dimension array, segment_1 will be 2x7 array,where the first array is the row indexes, and second array will be column value.

(array([0, 0, 0, 1, 1, 2, 3, 3, 3, 3, 4, 5, 5, 6, 6, 6]), array([1, 3, 6, 1, 2, 6, 0, 1, 4, 6, 5, 0, 4, 2, 3, 5]))

looking at first number in the first and second array,0 and 1 point to where the located in indexes

You can extract its value like

indexes[segment_1]
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

and then proceed with second class you want to get ,lets say 2

segment_2=np.where(image==2)
segment_2
(array([0, 0, 2, 2, 5, 5, 6]), array([0, 4, 2, 3, 1, 5, 1]))

and if you want to get each classes itsself. you can create a copy of indexes for each class,4 copies in total class_1=indexes and set to zero any value that is not equal to 1. class_1[class_1!=1]=0 and get something like this

array([[0, 1, 0, 1, 0, 0, 1],
   [0, 1, 1, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0, 1],
   [1, 1, 0, 0, 1, 0, 1],
   [0, 0, 0, 0, 0, 1, 0],
   [1, 0, 0, 0, 1, 0, 0],
   [0, 0, 1, 1, 0, 1, 0]])

for the eye, you may think that there are countour but from this example, you can tell that there is no clear contour of each segment. The only way i could think of,is to loop the image in rows and record where the value change and do the same in columns. I am not entired sure if this would be ideal situation. I hope i covered some part of your question. PDB is just a debugging package that allows you execute your code step by step

Tensorflow: How to retrieve information from the prediction Tensor?

Answers (1)

Related Questions