Alex
Alex

Reputation: 3454

trying to build pytorch 1.0.0 cuda 10.2 with support for old gpu (3.0)

I'm playing with a couple of projects that explicitly require pytorch == 1.0.0, but I have an old graphics card that only supports cuda 3.0 so I'm using the cpu, which is very slow, being the graphics card a dual gpu I decided to give a try and build pytorch from the sources with support for 3.0 (I have planned to update the pc but is not gonna happen anytime soon).

I am using docker to do the build, in particular I tried to modify an existing Dockerfile from build-pytorch, on the host system I am using debian/sid and there is cuda 10.2 cudnn 7.6 installed, I'm not sure if I can downgrade cuda, and I don't know if the versions in the container must be exactly the same as the host (like for nvidia drivers).

Gist of the modified Dockerfile

The first thing I noticed when updating the versions is that package cuda-cublas-dev-10-2 was not found, the latest version was 10-0, CUBLAS packaging changed in CUDA 10.1 to be outside of the toolkit installation path

If I install cublas version 10-0 or if I don't install it obviously no header files are found (error below), if I install the recommended libcublas-dev version the build continues for a while, with some warnings (below) , but then it stops with the error below.

I searched for the error online but I did not find anything specific, if I understand correctly there is a function declared more than once and when it is called the choice is ambiguous, but I have not yet investigated looking at the sources.

I would like to know if anyone has run into this error before and knows how to fix it.

libcublas-dev installed error:

[ 67%] Building NVCC (Device) object caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o
/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(58): error: more than one instance of function "at::native::sparse::cuda::cusparseGetErrorString" matches the argument list:
            function "cusparseGetErrorString(cusparseStatus_t)"
            function "at::native::sparse::cuda::cusparseGetErrorString(cusparseStatus_t)"
            argument types are: (cusparseStatus_t)

1 error detected in the compilation of "/tmp/tmpxft_00004ccc_00000000-6_SparseCUDABlas.cpp1.ii".
CMake Error at caffe2_gpu_generated_SparseCUDABlas.cu.o.Release.cmake:279 (message):
  Error generating file
  /pytorch/build/caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/./caffe2_gpu_generated_SparseCUDABlas.cu.o


caffe2/CMakeFiles/caffe2_gpu.dir/build.make:1260: recipe for target 'caffe2/CMakeFiles/caffe2_gpu.dir/__/aten/src/ATen/native/sparse/cuda/caffe2_gpu_generated_SparseCUDABlas.cu.o' failed

warnings:

ptxas warning : Too big maxrregcount value specified 96, will be ignored

missing header error:

Scanning dependencies of target caffe2_pybind11_state
[ 59%] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
In file included from /pytorch/aten/src/THC/THC.h:4:0,
                 from /pytorch/torch/lib/THD/../THD/base/TensorDescriptor.h:6,
                 from /pytorch/torch/lib/THD/../THD/base/TensorDescriptor.hpp:6,
                 from /pytorch/torch/lib/THD/../THD/THD.h:14,
                 from /pytorch/torch/lib/THD/base/DataChannelRequest.h:3,
                 from /pytorch/torch/lib/THD/base/DataChannelRequest.hpp:6,
                 from /pytorch/torch/lib/THD/base/DataChannelRequest.cpp:1:
/pytorch/build/caffe2/aten/src/THC/THCGeneral.h:17:23: fatal error: cublas_v2.h: No such file or directory
compilation terminated.
make[2]: *** [caffe2/torch/lib/THD/CMakeFiles/THD.dir/base/DataChannelRequest.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Upvotes: 0

Views: 769

Answers (1)

Alex
Alex

Reputation: 3454

Apparently the problem was that both libcusparse and aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu implement cusparseGetErrorString() and for version >= 10.2 the one in the library should be used.

--- aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu.orig 2020-11-16 12:13:17.680023134 +0000
+++ aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu  2020-11-16 12:13:45.158407583 +0000
@@ -9,7 +9,7 @@
 
 namespace at { namespace native { namespace sparse { namespace cuda {
 
-
+#if 0
 std::string cusparseGetErrorString(cusparseStatus_t status) {
   switch(status)
   {
@@ -51,6 +51,7 @@
       }
   }
 }
+#endif
 
 inline void CUSPARSE_CHECK(cusparseStatus_t status)
 {

I haven't tried yet if it works at runtime but the build is successful.

Upvotes: 1

Related Questions