Ground Truth Image to One Hot Encoded Array (Semantic Segmentation)

Question

I'm creating my own Dataset for People and Street Segmentation. Below, you see a labled Ground Truth (GT) Image.

In the past I did a simple Regression between the model output and the GT Image (in the past I only used Streets). Now I read, Cross Entropy Loss is more common in that case. Since, my GT and also the model output Image has the same width w and height h as the input image, I have to create an array of size h x w x c, where c is the number of classes (in my case 3, background, street, people). I think, this is called One-Hot-Encoded Array.

I solved this as follows:

        for height in range(len(img_as_np_array)):
            for width in range(len(img_as_np_array[0])):
                temp = np.zeros(classes)
                if get_class(img_as_np_array[height,width]) == 1:
                    temp[1] = 1
                    one_hot_label[height,width] = temp
                if get_class(img_as_np_array[height,width]) == 2:
                    temp[2] = 1
                    one_hot_label[height,width] = temp

where the method get_class(channels) decides the pixel class by the color of the pixel.

def get_class(channels):
    threshold = 40
    # Class 1 corresponds to streets, roads
    if channels[0] in np.arange(243-threshold,243+threshold,1) and \
        channels[1] in np.arange(169-threshold,169+threshold,1) and \
        channels[2] in np.arange(0,threshold,1):
        return 1

    # Class 2 corresponds to people
    if channels[0] in np.arange(0,threshold,1) and \
        channels[1] in np.arange(163-threshold,163+threshold,1) and \
        channels[2] in np.arange(232-threshold,232+threshold,1):
        return 2

    # Class 0 corresponds to background respectively other things
    return 0

I have two questions:

My approach is very slow (about 3 minutes for a Full HD Image), is there a way to speed this up?
I noticed that the colors differ in the sense of channel values. For example, orange should be [243,169,0] (RGB), but I found entries like this [206,172,8] or even this [207,176,24] could that happen, because I store my labels as jpg? Is there a better way to find the orange and blue pixels than my idea above with the threshold?

EDIT:

I solved the first question by my self. This takes 2 or 3 seconds for an Full HD Image:

threshold = 40
class_1_shape_cond_1 = (img_as_array[:, :, 0] >= 243 - threshold) * (img_as_array[:, :, 0] <= 243 + threshold)
class_1_shape_cond_2 = (img_as_array[:, :, 1] >= 171 - threshold) * (img_as_array[:, :, 1] <= 171 + threshold)
class_1_shape_cond_3 = (img_as_array[:, :, 2] >= 0) * (img_as_array[:, :, 2] <= threshold)
class_1_shape = (class_1_shape_cond_1 * class_1_shape_cond_2 * class_1_shape_cond_3)

Then I do the same for class 2 and for class 3 (everything else) I can do:

class_3_shape = 1 - (class_1_shape + class_2_shape)

After that I have to adjust the type with:

class_1_shape = class_1_shape.astype(np.uint8)
class_2_shape = class_2_shape.astype(np.uint8)
class_3_shape = class_3_shape.astype(np.uint8)

Question 2 is still an open.

Shai · Accepted Answer

Do NOT save labels as JPEG images!

jpeg is a lossy compression method - that is, it is designed to save images using fewer bits even if it changes a bit the pixel values as long as it "looks good" to a human observer.
This is NOT the case for training labels stored as images! You cannot afford inaccuracies in the labels. You must use a lossless compression method, e.g., png.

Better still, store your labels as indexed RGB images to begin with and save you all the trouble of inferring the discrete labels from the RGB values.

Ground Truth Image to One Hot Encoded Array (Semantic Segmentation)

Answers (2)

Do NOT save labels as JPEG images!

Related Questions