BoofKoor
BoofKoor

Reputation: 59

Issue with Running PyTorch Model on MPS of Apple M1

I tried to run my model on the MPS of my MacBook Air, but this warning and error raised:

UserWarning: The operator ‘aten::sgn.out’ is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1670525498485/work/aten/src/ATen/mps/MPSFallback.mm:11.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

UserWarning: Error detected in ConvolutionBackward0. Traceback of forward call that caused the error

(Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1670525498485/work/torch/csrc/autograd/python_anomaly_mode.cpp:119.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass

File “/Users/user/miniconda3/envs/torch/lib/python3.10/site-packages/torch/autograd/init.py”, line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Function ‘ConvolutionBackward0’ returned nan values in its 0th output. My system specifications are: MacOS:Ventura 13.2 Chip Apple M1 Python 3.10.9 PyTorch 1.13.1

Upvotes: 0

Views: 2336

Answers (1)

Rajesh Kontham
Rajesh Kontham

Reputation: 349

The second warning of an error detected in ConvolutionBackward0, which caused a trace back of the forward call.

  • The error message indicates that the ConvolutionBackward0 function returned nan (not-a-number) values in its 0th output, which resulted in a Runtime Error.
  • The "ConvolutionBackward0" function is used to calculate gradients in convolutional neural networks, which are commonly used for image processing tasks.
  • The error message suggests that there may be an issue with the backward pass of the network, which could be caused by a variety of factors such as:
    1. bad initialization of weights
    2. incorrect network architecture or
    3. insufficient training data.
    4. issues with the inputs or labels that are being used for training.

You could try running the model different machine to see if the issue persists. This will rule out the issue being the M1. If it works on other system, you may want to update PyTorch to the latest version since it could be some issue with your PyTorch installation.

Also, although, M1 chip is supported by PyTorch, but there may be some compatibility issues with specific versions of PyTorch and Python. PyTorch has released versions specifically optimized for the M1 chip, and it's recommended to use those versions for optimal performance. Additionally, some packages and libraries that are used with PyTorch may not yet be fully optimized for the M1 chip, which can also cause compatibility issues.

We can think this is a compatibility issue, since your first issue is '‘aten::sgn.out’ is not currently supported on the MPS backend'.

  • MPS is a feature that enables PyTorch to run on Apple M1 chips, which are based on a different architecture than traditional CPUs and require different optimizations.
  • Since ‘aten::sgn.out’ is not currently supported on MPS, it is mostly a compatibility issue, not something related to the PyTorch model.

Upvotes: 1

Related Questions