Prosciutt0
Prosciutt0

Reputation: 55

Transforms.Normalize returns values higher than 255 Pytorch

I am working on an video dataset, I read the frames as integers and convert them to a numpy array float32. After being loaded, they appear in a range between 0 and 255:

   [165., 193., 148.],
   [166., 193., 149.],
   [167., 193., 149.],
   ...

Finally, to feed them to my model and stack the frames I do the "ToTensor()" plus my transformation [transforms.Resize(224), transforms.Normalize([0.454, 0.390, 0.331], [0.164, 0.187, 0.152])]

and here the code to transform and stack the frames:

res_vframes = []
for i in range(len(v_frames)):
    res_vframes.append(self.transforms((v_frames[i])))
res_vframes = torch.stack(res_vframes, 0)

The problem is that after the transformation the values appears in this way, which has values higher than 255:

[tensor([[[1003.3293, 1009.4268, 1015.5244,  ..., 1039.9147, 1039.9147,
          1039.9147],...

Any idea on what I am missing or doing wrong?

Upvotes: 0

Views: 1805

Answers (2)

Hayoung
Hayoung

Reputation: 537

The behavior of torchvision.transforms.Normalize:

output[channel] = (input[channel] - mean[channel]) / std[channel]

Since the numerator of the lefthand of the above equation is greater than 1 and the denominator of it is smaller than 1, the computed value gets larger.

The class ToTensor() maps a tensor's value to [0, 1] only if some condition is satisfied. Check this code from official Pytorch docs:

if isinstance(pic, np.ndarray):
        # handle numpy array
        if pic.ndim == 2:
            pic = pic[:, :, None]

        img = torch.from_numpy(pic.transpose((2, 0, 1))).contiguous()
        # backward compatibility
        if isinstance(img, torch.ByteTensor):
            return img.to(dtype=default_float_dtype).div(255)
        else:
            return img

Therefore you need to divide tensors explicitly or make to match the above condition.

Upvotes: 2

Ophir Yaniv
Ophir Yaniv

Reputation: 366

Your normalization uses values between 0-1 and not 0-255.

You need to change your input frames to 0-1 or the normalization vectors to 0-255.

You can divide the frames by 255 before using the transform:

res_vframes = []
for i in range(len(v_frames)):
    res_vframes.append(self.transforms((v_frames[i]/255)))
res_vframes = torch.stack(res_vframes, 0)

Upvotes: 0

Related Questions