Reputation: 369
I tried following the tutorial from PyTorch here: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py.
Full code is here:
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Loading and normalizing CIFAR10
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# Shows training images, DOESN'T WORK
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
# define a convolutional neural network
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# Train the network
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
# DOESN'T WORK
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
# save trained model
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
# test the network on the test data
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
dataiter = iter(testloader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
# load back saved model
net = Net()
net.load_state_dict(torch.load(PATH))
# see what the nueral network thinks these examples above are:
ouputs = net(images)
# index of the highest energy
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
# accuracy on the whole dataset
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
# classes that perfomed well vs classes that didn't perform well
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
if __name__ == '__main__':
torch.multiprocessing.freeze_support()
However I got this issue:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I'm just trying to run this in a regular python file. When I added
if __name__ == '__main__':
freeze_support()
to the end of my file, I still get the error.
Upvotes: 7
Views: 7008
Reputation: 1839
Answering for future me or for anyone who faces this error:
Here's the solution: https://pytorch.org/docs/stable/notes/windows.html#multiprocessing-error-without-if-clause-protection
The implementation of multiprocessing is different on Windows, which uses spawn instead of fork. So we have to wrap the code with an if-clause to protect the code from executing multiple times. Refactor your code into the following structure.
import torch
def main()
# do everything here
if __name__ == '__main__':
main()
Upvotes: 3
Reputation: 151
On MacOS, M1 mac mini, torch version 1.13.1, adding this at the top of the script worked for me, without defining a main
:
# after `import torch`:
import torch.multiprocessing as mp
mp.set_start_method('fork', force=True)
Upvotes: 0
Reputation: 3285
Following worked for me:
import torch.multiprocessing as mp
mp.use_start_method('spawn', force=True)
force is essential as it was returning another error that context has already been set
if __name__ == '__main__':
) at the very first line even before imports (many answers on stackoverflow show that start() and join() method should be in the main and it works well. But I guess I am using several scripts and modules so it is not identifying the proper main so I had to include it in the first line of first file).Upvotes: 0
Reputation: 369
To anyone else with this issue, I believe you need to define a main function and run the training there. Then add:
if __name__ == '__main__':
main()
at the end of the python file.
This fixed the freeze_support() issue for me on a different PyTorch training program.
Upvotes: 12
Reputation: 473
Just set the num_workers
parameter equal to 0
for the train and test DataLoader
. In code just do this:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=0)
Upvotes: 2