optimal_transport_fan
optimal_transport_fan

Reputation: 369

PyTorch Tutorial freeze_support() issue

I tried following the tutorial from PyTorch here: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py.

Full code is here:

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


# Loading and normalizing CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


# Shows training images, DOESN'T WORK

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))


# define a convolutional neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Train the network
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    
    # DOESN'T WORK
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')


# save trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

# test the network on the test data
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
dataiter = iter(testloader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))


# load back saved model
net = Net()
net.load_state_dict(torch.load(PATH))

# see what the nueral network thinks these examples above are:
ouputs = net(images)

# index of the highest energy
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

# accuracy on the whole dataset
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

# classes that perfomed well vs classes that didn't perform well
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))


if __name__ == '__main__':
    torch.multiprocessing.freeze_support()

However I got this issue:

An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

I'm just trying to run this in a regular python file. When I added

if __name__ == '__main__':
                freeze_support()

to the end of my file, I still get the error.

Upvotes: 7

Views: 7008

Answers (5)

R.K
R.K

Reputation: 1839

Answering for future me or for anyone who faces this error:

Here's the solution: https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection

The implementation of multiprocessing is different on Windows, which uses spawn instead of fork. So we have to wrap the code with an if-clause to protect the code from executing multiple times. Refactor your code into the following structure.

import torch

def main()
    # do everything here

if __name__ == '__main__':
    main()

Upvotes: 3

Kevin Newman
Kevin Newman

Reputation: 151

On MacOS, M1 mac mini, torch version 1.13.1, adding this at the top of the script worked for me, without defining a main:

# after `import torch`:    

import torch.multiprocessing as mp

mp.set_start_method('fork', force=True)

Upvotes: 0

devil in the detail
devil in the detail

Reputation: 3285

Following worked for me:

  1. Using spawn start method

import torch.multiprocessing as mp

mp.use_start_method('spawn', force=True)

force is essential as it was returning another error that context has already been set

  1. Use main function (if __name__ == '__main__':) at the very first line even before imports (many answers on stackoverflow show that start() and join() method should be in the main and it works well. But I guess I am using several scripts and modules so it is not identifying the proper main so I had to include it in the first line of first file).

Upvotes: 0

optimal_transport_fan
optimal_transport_fan

Reputation: 369

To anyone else with this issue, I believe you need to define a main function and run the training there. Then add:

if __name__ == '__main__':
    main()

at the end of the python file.

This fixed the freeze_support() issue for me on a different PyTorch training program.

Upvotes: 12

melqkiades
melqkiades

Reputation: 473

Just set the num_workers parameter equal to 0 for the train and test DataLoader. In code just do this:

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=0)

testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=0)

Upvotes: 2

Related Questions