Reputation: 5553
When reading the semantic segmentation paper, sometime I can read the term like one-hot labelling for mask images. I am not clear what does it really mean? When reading some implementations, I can see they are usually of the shape rows*columns*2
My guess is that one channel corresponds to foreground and the other one corresponds to background. Is that right? Further more, how can i know which one is foreground? If the existing training set is only of shape rows*columns*1
. How can I transfer it to this type of format, i.e., rows*columns*2
? What I am doing is just using newimage[:,:,:,0] = original_image
and newimage[:,:,:,1] = 1-original_image
. But I am not sure whether it is right?
Upvotes: 2
Views: 2898
Reputation: 5162
Categorical labels like 1,2,3,4,5 etc. don't have any natural ordering. So using those numbers might imply that label 5 is greater than label 1 but refrigerator and dog are just two labels with no natural ordering for example.
So we convert the labels 1,2,3,4,5 to
[1,0,0,0,0], [0,1,0,0,0], ...,[0,0,0,0,1]
So now they are just vectors pointing in some direction and it makes it easier to work with for logistic regression and other loss functions.
Also you can encode the foreground background already with rows*columns*1
Simply set foreground values to 1 and background to 0 then we have our foreground background mask.
I'd need to see an example of when to use rows*columns*2 because that one isn't as common and would probably vary depending upon where you saw it.
Upvotes: 5