Reputation: 3454
I'm playing with a couple of projects that explicitly require pytorch == 1.0.0, but I have an old graphics card that only supports cuda 3.0 so I'm using the cpu, which is very slow, being the graphics card a dual gpu I decided to give a try and build pytorch from the sources with support for 3.0 (I have planned to update the pc but is not gonna happen anytime soon).
I am using docker to do the build, in particular I tried to modify an existing Dockerfile from build-pytorch, on the host system I am using debian/sid and there is cuda 10.2 cudnn 7.6 installed, I'm not sure if I can downgrade cuda, and I don't know if the versions in the container must be exactly the same as the host (like for nvidia drivers).
Gist of the modified Dockerfile
The first thing I noticed when updating the versions is that package cuda-cublas-dev-10-2
was not found, the latest version was 10-0,
CUBLAS packaging changed in CUDA 10.1 to be outside of the toolkit installation path
If I install cublas version 10-0 or if I don't install it obviously no header files are found (error below), if I install the recommended libcublas-dev
version the build continues for a while, with some warnings (below) , but then it stops with the error below.
I searched for the error online but I did not find anything specific, if I understand correctly there is a function declared more than once and when it is called the choice is ambiguous, but I have not yet investigated looking at the sources.
I would like to know if anyone has run into this error before and knows how to fix it.
libcublas-dev installed error:
[ 67%] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o
/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(58): error: more than one instance of function "at::native::sparse::cuda::cusparseGetErrorString" matches the argument list:
function "cusparseGetErrorString(cusparseStatus_t)"
function "at::native::sparse::cuda::cusparseGetErrorString(cusparseStatus_t)"
argument types are: (cusparseStatus_t)
1 error detected in the compilation of "/tmp/tmpxft_00004ccc_00000000-6_SparseCUDABlas.cpp1.ii".
CMake Error at caffe2_gpu_generated_SparseCUDABlas.cu.o.Release.cmake:279 (message):
Error generating file
/pytorch/build/caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/./caffe2_gpu_generated_SparseCUDABlas.cu.o
caffe2/CMakeFiles/caffe2_gpu.dir/build.make:1260: recipe for target 'caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o' failed
warnings:
ptxas warning : Too big maxrregcount value specified 96, will be ignored
missing header error:
Scanning dependencies of target caffe2_pybind11_state
[ 59%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
In file included from /pytorch/aten/src/THC/THC.h:4:0,
from /pytorch/torch/lib/THD/../THD/base/TensorDescriptor.h:6,
from /pytorch/torch/lib/THD/../THD/base/TensorDescriptor.hpp:6,
from /pytorch/torch/lib/THD/../THD/THD.h:14,
from /pytorch/torch/lib/THD/base/DataChannelRequest.h:3,
from /pytorch/torch/lib/THD/base/DataChannelRequest.hpp:6,
from /pytorch/torch/lib/THD/base/DataChannelRequest.cpp:1:
/pytorch/build/caffe2/aten/src/THC/THCGeneral.h:17:23: fatal error: cublas_v2.h: No such file or directory
compilation terminated.
make[2]: *** [caffe2/torch/lib/THD/CMakeFiles/THD.dir/base/DataChannelRequest.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Upvotes: 0
Views: 769
Reputation: 3454
Apparently the problem was that both libcusparse
and aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu
implement cusparseGetErrorString()
and for version >= 10.2 the one in the library should be used.
--- aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu.orig 2020-11-16 12:13:17.680023134 +0000
+++ aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu 2020-11-16 12:13:45.158407583 +0000
@@ -9,7 +9,7 @@
namespace at { namespace native { namespace sparse { namespace cuda {
-
+#if 0
std::string cusparseGetErrorString(cusparseStatus_t status) {
switch(status)
{
@@ -51,6 +51,7 @@
}
}
}
+#endif
inline void CUSPARSE_CHECK(cusparseStatus_t status)
{
I haven't tried yet if it works at runtime but the build is successful.
Upvotes: 1