Reputation: 55
I am working on an video dataset, I read the frames as integers and convert them to a numpy array float32. After being loaded, they appear in a range between 0 and 255:
[165., 193., 148.],
[166., 193., 149.],
[167., 193., 149.],
...
Finally, to feed them to my model and stack the frames I do the "ToTensor()" plus my transformation [transforms.Resize(224), transforms.Normalize([0.454, 0.390, 0.331], [0.164, 0.187, 0.152])]
and here the code to transform and stack the frames:
res_vframes = []
for i in range(len(v_frames)):
res_vframes.append(self.transforms((v_frames[i])))
res_vframes = torch.stack(res_vframes, 0)
The problem is that after the transformation the values appears in this way, which has values higher than 255:
[tensor([[[1003.3293, 1009.4268, 1015.5244, ..., 1039.9147, 1039.9147,
1039.9147],...
Any idea on what I am missing or doing wrong?
Upvotes: 0
Views: 1805
Reputation: 537
The behavior of torchvision.transforms.Normalize
:
output[channel] = (input[channel] - mean[channel]) / std[channel]
Since the numerator of the lefthand of the above equation is greater than 1 and the denominator of it is smaller than 1, the computed value gets larger.
The class ToTensor()
maps a tensor's value to [0, 1] only if some condition is satisfied. Check this code from official Pytorch docs:
if isinstance(pic, np.ndarray):
# handle numpy array
if pic.ndim == 2:
pic = pic[:, :, None]
img = torch.from_numpy(pic.transpose((2, 0, 1))).contiguous()
# backward compatibility
if isinstance(img, torch.ByteTensor):
return img.to(dtype=default_float_dtype).div(255)
else:
return img
Therefore you need to divide tensors explicitly or make to match the above condition.
Upvotes: 2
Reputation: 366
Your normalization uses values between 0-1 and not 0-255.
You need to change your input frames to 0-1 or the normalization vectors to 0-255.
You can divide the frames by 255 before using the transform:
res_vframes = []
for i in range(len(v_frames)):
res_vframes.append(self.transforms((v_frames[i]/255)))
res_vframes = torch.stack(res_vframes, 0)
Upvotes: 0