Input image size of Faster-RCNN model in Pytorch

Question

I'm Trying to implement of Faster-RCNN model with Pytorch. In the structure, First element of model is Transform.

from torchvision.models.detection import fasterrcnn_resnet50_fpn

model = fasterrcnn_resnet50_fpn(pretrained=True)

print(model.transform)
GeneralizedRCNNTransform(
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    Resize(min_size=(800,), max_size=1333, mode='bilinear')
)

When images pass forward of Resize(), They come out with (800,h) or (w, 1333) according to ratio of Width and Height.

for i in range(2):
    _, image, target = testset.__getitem__(i)
    img = image.unsqueeze(0)
    output, _ = model.transform(img)

Before Transform : torch.Size([512, 640])
After Transform : [(800, 1000)]
Before Transform : torch.Size([315, 640])
After Transform : [(656, 1333)]

My question is how to get those resized output and why they use This method? I can't find the information in the paper and I can't understand the source code about transform in fasterrcnn_resnet50_fpn.

Sorry for my English

Input image size of Faster-RCNN model in Pytorch

Answers (1)

Related Questions