Reputation: 41
I trained a U-net with inputs of satellite images of 120 X 120.
I need to apply my model to a bigger image (size 10980 X 10980). What I tried to do was slice the bigger images into slices of 120 X120 classify these and assemble them into a new image.
My question is: is this approach viable since I can see discontinuity in my output image below?
PS: I saw this question semantic segmentation for large images a user said it's doable, if so is there any way to make the borders more continuous?
Upvotes: 3
Views: 1034
Reputation: 5948
Shai's answer only works when the model has a reasonably small receptive field. The trend in modern networks is to incorporate more global information (e.g. ViTs), which makes every pixel dependent on the exact boundaries at the input. When this happens Shai's answer is only a partial fix. You'll still get discontinuities.
Smoothly-Blend-Image-Patches (as suggested by ferlix), which I'll call SBIP here, is a nice algorithm. But their implementation only works for small images because they put everything through the model at once. It also lacks many configuration options.
The algorithm works by running the model on overlapping tiles, but instead of simply cropping the output, like in Shai's answer, where the tiles overlap, SBIP smoothly blends the logits from one tile to the other. So, at the edge of each overlapping region between tiles, the logits come entirely from one tile, and on the other edge, they come entirely from the adjacent tile.
Here's a rough explanation of the special case of using 50% overlap (maximally smooth):
I independently discovered this solution and made a github repo implementing it. But I also I fixed the problems with SBIP and expanded the supported cases. Most notably:
Upvotes: 0
Reputation: 71
I think this library does what you need, using interpolation with a simple second order spline window function:
https://github.com/Vooban/Smoothly-Blend-Image-Patches
It works only if your original image size is not extremely big because of memory constrains.
Upvotes: 1
Reputation: 114926
If your model is fully convolutional, you can trivially apply it to larger images. Your only limitation is your device's memory size.
If you have no way but slicing the image, you can still avoid discontinuities, but taking into account your model's receptive field:
If you crop much larger crops - that considers the true size of the receptive field - and keep only the central, "valid", output mask, you should be able to get a smooth and continuous mask.
Upvotes: 1