Reputation: 157
I am trying to run a graph model using dgl in Google Colab, but continue to get error when training the model. I believe my primary problem is that I can not load the dgl-cuda library using
!pip install dgl-cu111
I get the following errors:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: Could not find a version that satisfies the requirement dgl-cu111 (from versions: none)
ERROR: No matching distribution found for dgl-cu111
When training the model, I get the following error:
load_done
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py:71: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1
warnings.warn("dropout option adds dropout after all but last "
init done
Epoch 1: 0%| | 0/49 [00:00<?, ?it/s]
---------------------------------------------------------------------------
DGLError Traceback (most recent call last)
<ipython-input-7-783797e86ab0> in <cell line: 209>()
207
208
--> 209 train(model)
210 test_func(model, y_test, X_test)
3 frames
<ipython-input-7-783797e86ab0> in train(net)
179 gc.collect()
180 continue
--> 181 acc, loss, _ = fwd_pass(batch_X, batch_y, train=True)
182
183 losses.append(loss.item())
<ipython-input-7-783797e86ab0> in fwd_pass(X, y, train)
108 for item in X:
109 x = [0, 0]
--> 110 x[0] = item[0].to(device)
111 x[1] = item[1].to(device)
112 out.append(model(x))
/usr/local/lib/python3.10/dist-packages/dgl/heterograph.py in to(self, device, **kwargs)
5707
5708 # 1. Copy graph structure
-> 5709 ret._graph = self._graph.copy_to(utils.to_dgl_context(device))
5710
5711 # 2. Copy features
/usr/local/lib/python3.10/dist-packages/dgl/heterograph_index.py in copy_to(self, ctx)
253 The graph index on the given device context.
254 """
--> 255 return _CAPI_DGLHeteroCopyTo(self, ctx.device_type, ctx.device_id)
256
257 def pin_memory(self):
dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FunctionBase.__call__()
dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall()
dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall3()
DGLError: [01:00:58] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing: Device API cuda is not enabled. Please install the cuda version of dgl.
Stack trace:
[bt] (0) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x75) [0x7fae2b978e55]
[bt] (1) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPIManager::GetAPI(std::string, bool)+0x1f2) [0x7fae2bcf85f2]
[bt] (2) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPI::Get(DGLContext, bool)+0x1e1) [0x7fae2bcf2ba1]
[bt] (3) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DGLDataType, DGLContext)+0x13b) [0x7fae2bd15acb]
[bt] (4) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::CopyTo(DGLContext const&) const+0xc3) [0x7fae2bd4fe23]
[bt] (5) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::UnitGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0x3ef) [0x7fae2be5d79f]
[bt] (6) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::HeteroGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0xf6) [0x7fae2bd61286]
[bt] (7) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(+0x52cbb6) [0x7fae2bd70bb6]
[bt] (8) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fae2bcf7bb8]
Any thoughts about how to install the dgl-gpu libraries on Google Colab? I am using Colab's A100 GPU:
(nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0)
Upvotes: 3
Views: 3452
Reputation: 107
this answer is for anyone who's trying to match pytorch
and dgl
versions
After a lot of back and forth trying to match python
,pytorch
and cuda
versions [1], the following steps worked for me. (It's easier to start with a new environment because there might be lots of conflicts going on with packages)
[1] - https://www.dgl.ai/pages/start.html
## Create new environment, use arbitrary name "myenv" that you prefer
conda create -n myenv python=3.11
## Activate environment
source activate myenv
## Install pytorch 2.2
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
## Install dgl which matches pytorch 2.2 and cuda 12.1
conda install -c dglteam/label/cu121 dgl
## Add environment to jupyter kernel
conda install -c anaconda ipykernel -y
python -m ipykernel install --user --name=myenv
# install remaining things that dgl needs
pip install torchdata
pip install pandas
pip install pyyaml
pip install pydantic
Example code
import torch.nn.functional as F import dgl from dgl.nn import GraphConv import torch.nn as nn import torch class Classifier(nn.Module): def __init__(self, in_dim, out_dim): super(Classifier, self).__init__() self.conv1 = GraphConv(in_dim, out_dim,) def forward(self, g, h): # Apply graph convolution and activation. h = F.relu(self.conv1(g, h)) return h src_ids = torch.tensor([2, 3, 4]) dst_ids = torch.tensor([1, 2, 3]) device = torch.device('cuda:0') g = dgl.graph((src_ids, dst_ids)).to(device) g = dgl.add_self_loop(g) x = torch.randn((5, 100)).to(device) model = Classifier(100, 20).to(device) model(g, x)
Upvotes: 0
Reputation: 472
I've had this problem with a V100 GPU. My workaround was to specify the source:
pip install dgl==1.0.1+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html
Make sure you select the correct CUDA version for your setup.
Upvotes: 3