Cara Duf
Cara Duf

Reputation: 371

Import error while launching PyTorch Lightning project on Colab TPU

I followed this guide to launch my PyTorch Lightning project on Google Colab TPU. So I installed

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl

Then

 !pip install pytorch-lightning

Then I

!pip install torch torchvision torchaudio 
!pip install -r requirements.txt

After installing the project requirements, I restarted the runtime as requested and re-ran the cloud-TPU-client install, the pytorch-lightning install, and both command from above. It ran smoothly.

But just after the TPU has started with version PyTorch version 1.9, I get the following import error :

WARNING:root:TPU has started up successfully with version pytorch-1.9
        Traceback (most recent call last):
          File "synthesizer_train.py", line 2, in <module>
            from synthesizer.train import train
          File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
            from synthesizer.models.tacotron import Tacotron
          File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
            import pytorch_lightning as pl
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
            from pytorch_lightning.callbacks import Callback  # noqa: E402
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
            from pytorch_lightning.callbacks.base import Callback
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
            from pytorch_lightning.utilities.types import STEP_OUTPUT
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
            from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 26, in <module>
            from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_AVAILABLE
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 101, in <module>
            from pytorch_lightning.utilities.xla_device import XLADeviceUtils  # noqa: E402
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/xla_device.py", line 24, in <module>
            import torch_xla.core.xla_model as xm
          File "/usr/local/lib/python3.7/dist-packages/torch_xla/__init__.py", line 142, in <module>
            import _XLAC
        ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at13_foreach_erf_EN3c108ArrayRefINS_6TensorEEE

Trainer was launched with the flag TPU_cores=8.

The model had run on CPU and GPU beforehand (ie on another session).

I tried to downgrade PyTorch to 1.9 (the same as the one shown when TPU is starting) because Colab uses torch 1.10.0+cu111 and a different error appeared :

WARNING:root:TPU has started up successfully with version pytorch-1.9
Traceback (most recent call last):
  File "synthesizer_train.py", line 2, in <module>
    from synthesizer.train import train
  File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
    from synthesizer.models.tacotron import Tacotron
  File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
    from pytorch_lightning.callbacks.base import Callback
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
    from pytorch_lightning.utilities.types import STEP_OUTPUT
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 29, in <module>
    if _compare_version("torchtext", operator.ge, "0.9.0"):
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 54, in _compare_version
    pkg = importlib.import_module(package)
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/__init__.py", line 5, in <module>
    from . import vocab
  File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/__init__.py", line 11, in <module>
    from .vocab_factory import (
  File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py", line 4, in <module>
    from torchtext._torchtext import (
ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

Is there anything I can do to train the model on TPU ?

Thank you very much

Upvotes: 1

Views: 5648

Answers (2)

whywhywhy
whywhywhy

Reputation: 278

Based on above solution, we could additionally fix the issue for sure by finding the version of cuda installed with

import torch
torch.version.cuda

10.2

Based on this cuda version perform this pip install command

!pip install cloud-tpu-client==0.10 torchvision==0.12.0+cu102 torch==1.11.0+cu102 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.11-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/cu102/torch_stable.html

Pls notice the cu102 in three places in the above command

Upvotes: 1

Cara Duf
Cara Duf

Reputation: 371

Actually the same problem has also been described and the suggested solution did work for me.

So in the details they suggest to downgrade PyTorch to 1.9.0+cu111 (mind the +cu111) after installing torch_xla.

Consequently here are the steps I followed to launch my Lightning project on Google Colab with TPU :

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html

And then the project's pip :

!pip install torch torchvision torchaudio pytorch-lightning
!pip install -r requirements.txt

And it worked even though after this last step, I had to restart runtime.

Upvotes: 1

Related Questions