Peter Carles
Peter Carles

Reputation: 55

How can I do to evaluate mean and std for a dataset?

I am using pytorch and the dataset fashion MNIST but I do not know how can I do to evaluate the mean and the std for this dataset. Here is my code :

import torch
from torchvision import datasets, transforms
import torch.nn.functional as F

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((mean), (std))])
batch_size = 32
train_loader = torch.utils.data.DataLoader(datasets.MNIST(
'../data', train=True, download=True, transform=transform)
, batch_size=batch_size, shuffle=True)

Could you help me please ?

Thank you very much !

Upvotes: 1

Views: 3036

Answers (2)

Eric
Eric

Reputation: 28

this link provides different values for std https://github.com/keon/3-min-pytorch/issues/26 in the case of Fashion Mnist, and I think it is because you are computing std per batch. The code that appears on the link is

transforms.ToTensor()
])

dataset = torchvision.datasets.FashionMNIST(root = './.data', train = True,
download = True, transform = transform)

FashionMNIST_mean = dataset.data.numpy().mean(axis = (0, 1, 2))
print(FashionMNIST_mean / 255)
FashionMNIST_std = dataset.data.numpy().std(axis = (0, 1, 2))
print(FashionMNIST_std / 255) ```

Upvotes: 0

Rishit Dagli
Rishit Dagli

Reputation: 1006

Use this to calculate mean and std-

loader = data.DataLoader(dataset,
                         batch_size=10,
                         num_workers=0,
                         shuffle=False)

mean = 0.
std = 0.
for images, _ in loader:
    batch_samples = images.size(0) # batch size (the last batch can have smaller size!)
    images = images.view(batch_samples, images.size(1), -1)
    mean += images.mean(2).sum(0)
    std += images.std(2).sum(0)

mean /= len(loader.dataset)
std /= len(loader.dataset)

Upvotes: 5

Related Questions