Reputation: 83
I'm writing a code for Image Segmentation in Python. I'm trying to implement some different metrics for the model. For example Euclidean Distnce like this:
def euc_dist(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
return K.sqrt(K.sum(K.square(y_true_f - y_pred_f)) + smooth)
After fitting the model, results:
Epoch 1/20
9/9 [==============================] - 6s 327ms/step - loss: -0.1111 - euc_dist: 199.7511 - val_loss: -0.6014 - val_euc_dist: 72.3878
Epoch 2/20
9/9 [==============================] - 2s 181ms/step - loss: -0.2737 - euc_dist: 43.5101 - val_loss: -0.2338 - val_euc_dist: 86.8938
Epoch 3/20
9/9 [==============================] - 2s 180ms/step - loss: -0.3146 - euc_dist: 37.0605 - val_loss: -0.5544 - val_euc_dist: 72.5370
Epoch 4/20
9/9 [==============================] - 2s 182ms/step - loss: -0.3611 - euc_dist: 37.9538 - val_loss: -0.5871 - val_euc_dist: 71.2690
Epoch 5/20
9/9 [==============================] - 2s 210ms/step - loss: -0.4189 - euc_dist: 31.2415 - val_loss: -0.6047 - val_euc_dist: 72.1360
Epoch 6/20
9/9 [==============================] - 2s 180ms/step - loss: -0.5565 - euc_dist: 35.6810 - val_loss: -0.5976 - val_euc_dist: 69.8348
Epoch 7/20
9/9 [==============================] - 2s 180ms/step - loss: -0.6854 - euc_dist: 27.3660 - val_loss: -0.6059 - val_euc_dist: 65.5709
Epoch 8/20
9/9 [==============================] - 2s 211ms/step - loss: -0.7679 - euc_dist: 23.3767 - val_loss: -0.5841 - val_euc_dist: 67.5863
and so on. How interpretate this results, in particular 'eur_dist', from the point of view of similary between two images? Thank you very much Thank you very much
Upvotes: 1
Views: 1011
Reputation: 1094
I will assume that the images are RGB, which means that every pixel can be represented as a triplet of R, G, B components.
For testing purposes, I'll use packages numpy and PILL. PIL allows easy manipulation with the dtype uint8. This means that the value of every R,G,B component is a number between 0-255 (u-unsigned, int8 - 8 bits are used to represent and integer).
So let's construct a simple image using numpy (disclaimer, this can probably be done in other ways, but my goal is to keep it simple and readable):
arr1 = np.zeros([3, 3, 3], dtype=np.int32)
base_col = [0, 0, 0] # starting color - pure black
for i in range(3):
for j in range(3):
arr1[i, j] = base_col
base_col[0] = base_col[0] + 255/9
img1 = Image.fromarray(arr1.astype(np.uint8))
img1.save('img1.png')
This image is a 3x3 image, with 3 pixels. The pixels start from color (0,0,0) (pure black) and move to (226, 0, 0) (almost pure red). Every pixel in the matrix has its red component increased by 255/9 compared to the previous one:
Now let's create another image. This time, we'll do the exact same thing, except we'll change the value of the last pixel. The previous last pixel (down right corner) had an RGB triplet: (226, 0, 0). So we'll set the new one manually to: (226, 0, 129) (we just increase the blue component by 129). This is what the new image looks like:
Makes sense, right? Because we only changed one pixel and altered it's blue component, we got a purple-ish color. For clarity, the code used is (everything is the same, except the last line):
arr2 = np.zeros([3, 3, 3], dtype=np.int32)
base_col = [0, 0, 0]
for i in range(3):
for j in range(3):
arr2[i, j] = base_col
base_col[0] = base_col[0] + 255 / 9
arr2[2, 2] = [226, 0, 129] # the only difference
img2 = Image.fromarray(arr2.astype(np.uint8))
img2.save('img2.png')
Let's create a third image. This time we'll do the same thing we did with image 1, but instead of starting from pure black, let's start from an already grey-ish image. So, the starting triplet will be (25, 25, 25), instead of (0, 0, 0), and now we will increase the red component by 255/9 again:
arr3 = np.ones([3, 3, 3], dtype=np.int32)
arr3 = arr3 * 25 # start from (25, 25, 25) instead of (0, 0, 0)
base_col = [0, 0, 0]
for i in range(3):
for j in range(3):
arr3[i, j] = arr3[i, j] + base_col
base_col[0] = base_col[0] + 255 / 9
img3 = Image.fromarray(arr3.astype(np.uint8))
img3.save('img3.png')
The end effect of this is that all pixels are actually just increased by (25, 25, 25), compared to the first picture. Here's what the 3rd image looks like:
Hard to see a difference at all, right? (compared to the first image).
So, why did I write all of this?
Because the euclidean distance between img 1 and img 2, and between img 1 and img3 is identical.
The following code:
print(np.sqrt(np.sum(np.square(arr1.flatten() - arr2.flatten()))))
print(np.sqrt(np.sum(np.square(arr1.flatten() - arr3.flatten()))))
Prints out:
129.0
129.9038105676658
This is logical. Img1 and img2 only differ in 1 RGB component of the last (one) pixel. In img1 the last pixel has a triplet (226, 0, 0), while in img2 the last pixel has a triplet (226, 0, 129). So, the only difference is 129. Squaring it, and then rooting results in 129. This is what the Euclidean distance metric does.
What about img1 and img3? Well, the Euclidean metric does the following:
1.) find difference between every element of the flattened arrays
2.) square that difference
3.) sum all the squares together
4.) find root of previous sum
If we flatten our arrays of images 1 and images 3, we get the following:
print(arr1.flatten())
print(arr3.flatten())
Every individual RGB component of img3 is increased by 25 compared to the same pixel in img1. So, let's hand compute the Euclidean distance metric:
1.) 9 pixels, each has 3 components = 27 numbers in a flattened array. Every number is by 25 off in img3 compared to img1.
2.) square all the differences. Every difference is 25, so 25x25 = 625
3.) sum all the squared differences. We have 27 numbers, so that's 27x625 = 16.875
4.) sqrt(16.875) = 129.711
So the conclusion is:
The Euclidean metric will compare every pixel in both images, subtract the RGB triplets, square the differences and sum them together. On this toy example it showed that two images can have a really different single pixel and have a value of 129, but also two images can slightly differ in every pixel (almost impossible to see to the human eye), and have the same value of the metric.
I don't know what the purpose of your program is, but if it's something scientific, you might want to check for some well-known metrics for evaluating the performance of image-segmentation. Without giving it too much thought, this paper seems "ok":
https://ieeexplore.ieee.org/abstract/document/949797
Generally it might be a good idea to use more performance metrics. It's rarely the case that one, single metric, that can give an adequate evaluation of anything.
Upvotes: 1