Reputation: 135
When doing semantic segmentation with the U-Net for example, it seems to be common-practice to provide the label data as one-hot-encoded tensors. In another SO question, a user pointed out that this is due to the labels usually representing categorical values. Feeding them to the network as class labels within only one layer (as greyscale intensity values) would introduce difficulties.
In another blog post however, the author explains that the labels
"[...] sometimes [get] packaged as greyscale images, where the pixel intensity represents the class id [...]. This method can be the easiest to work with. It allows for a small file size for distribution and [...] One Hot Vector representations [use] up more memory than [the greyscale encoding format]."
My hardware is only very limited, and I am hoping that encoding the labels as 1-layered greyscale tensors, rather than n-layered (n being the number of classes to segment), will lead to lower memory usage. However, the author of the blog then also states:
"Even if the deep learning framework you use accepts the labels data as class ids, as in [the greyscale format], it will convert that data to one-hot encoding behind the scenes."
Does this mean, there wouldn't be any savings memory-wise after all?
If it is worthwhile, how would I go on to implementing this in the dataset-reader? I also haven't encountered any implementation, where the greyscale labeling has in fact been practiced. I'd therefore also be thankful for any links to implementations that have been using greyscale labels for semantic segmentation!
I am working with PyTorch and my code is based on this implementation, with the difference that I have 3 classes to segment.
Any suggestions/links are greatly appreciated!
Upvotes: 1
Views: 1755
Reputation: 745
This can help you save disk memory as you will be able to store the labels, the ground truth, as a greyscale image (width, heigh, 1) and not as a bigger 3D tensor of shape (width, height, n). But, during the training process, you will have to convert the greyscale ground truth images to 3D tensor to be able to train your network. So It won't help you to reduce the RAM cost of the process.
If you really need to reduce the RAM usage, you can decrease the training batch size, or the image sizes.
Upvotes: 2