Reputation: 93
I understand I need to allocate both the input tensor and the model parameters to the mps device in order for PyTorch to use my Mac M1 GPU for training. I did just that and it still gave me this error message:
*Placeholder storage has not been allocated on MPS device! * Here is a snippet of my code.
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones(1, device=mps_device)
print(x)
else:
print("MPS device not found")
model = LSTMModel(input_size=197,
hidden_size=HIDDEN_UNITS,
output_size=1,layer_size=2)
# Transfer the model to the GPU
model = model.to(mps_device)
# iterate over the training data
for i in range(100000):
# send the input/labels to the GPU
next_batch = next(train_generate)
inputs = torch.from_numpy(next_batch[0]).float().to(mps_device)
labels = torch.from_numpy(next_batch[1][-1]).float().to(mps_device)
with torch.set_grad_enabled(True):
outputs = model(inputs)
loss = loss_function(outputs, labels)
# backward
loss.backward()
optimizer.step()
I followed documentation of how to use M1 GPU for PyTorch but it didn't work.
Upvotes: 4
Views: 6283
Reputation: 305
Using no_cuda=True
solved it for me on an M1 mac.
Like so
training_args = TrainingArguments(
...,
no_cuda=True
)
Upvotes: 7
Reputation: 93
I have figured out the problem. The model object has internal variables that are not parameters (weights). I have to set those variables to use mps_device explicitly. Specifying model.to(mps_device) is not sufficient.
def forward(self, x):
# Start with empty network output and cell state to initialize the sequence
c_0 = torch.zeros(self.layer_size, BATCH_SIZE, self.hidden_size).requires_grad_().to(mps_device)
h_0 = torch.zeros(self.layer_size, BATCH_SIZE, self.hidden_size).requires_grad_().to(mps_device)
# Iterate over all sequence elements across all sequences of the mini-batch
#print(x.size())
out, (h_t, c_t) = self.lstm(x, (h_0.detach(), c_0.detach()))
# Final output layer
return self.sig(self.fc(out[-1]))
Upvotes: 3