Reputation: 41
I'm training a model to recognize hands and want to extract the segmentation masks after detection using the matterport MRCNN (https://github.com/matterport/Mask_RCNN):
model= mrcnn.model.MaskRCNN(mode="inference",
config=SimpleConfig(),
model_dir=os.getcwd())
model.load_weights( filepath="mask_rcnn_0028.h5",
by_name=True)
image = cv2.imread("CARDS_COURTYARD.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = model.detect([image], verbose=0)
r = results[0]
mrcnn.visualize.display_instances(image=image,
boxes=r['rois'],
masks=r['masks'],
class_ids=r['class_ids'],
class_names=CLASS_NAMES,
scores=r['scores'])
Here is an example detection:
MaskRCNN hands detection output image
After detection, I reshape the masks boolean array (saved in the model as r['masks']) so I can access each segmentation mask individually (masks[0] being masks of the first class id, in this case 'yourright'), and save each array as an image:
masks=r['masks']
masks = masks.reshape(2, 720, 1280)
im = Image.fromarray(masks[0])
im.save("mask.jpeg")
My output from this is:
Whilst this is the shape of the segmentation mask, and the dimensions are the same as the original image, the output image is not the segmentation as it appears in the original image. I am looking for the extracted masks to be output as they are overlayed on the original image, and not 'zoomed-in' as they are currently. I assumed because the masks array held the same dimensions of the original image that the masks would retain their position, but apparently not. How can I output the segmentation masks as they appear in the original image?
cheers
Upvotes: 2
Views: 2214
Reputation: 41
Figured out the solution myself– posting it here in case anyone else runs into the same issue...
The problem is that I misunderstood how reshaping the array worked; reshaping the third dimension to the first isn't a superficial change, but 'reshapes' the data entirely, therefore any extrapolated image is an entirely different shape, although I'm still unsure as to how the masks retained its general shape regardless. Reshaping the data, as I had done, is entirely unneeded as you can call upon each dimension irrespective of its position. I previously thought that to call upon the 3rd dimension only it has to be reshaped to appear as the first:
masks = masks.reshape(2, 720, 1280)
im = Image.fromarray(masks[0])
Changing the shape in this way reorganises the data and distorts the image. You can easily specify which dimension to call upon with:
im = Image.fromarray(masks[:,:,0])
In this case, I'm accessing the first (0) layer of the 3rd dimension of the array.
converting this to an image produces the mask as seen in the detection image:
[yourright detection][1] [1]: https://i.sstatic.net/ewMY3.jpg
An easy mistake to make, especially if, like me, you are extremely new to python!
Upvotes: 2