Reputation: 33
I'm trying to train YOLOv4 with darknet on a computing cluster. But when I make
the darknet, it occured that:
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
make: *** [darknet] Error 1
This computing cluster can load software with module load
. For example, when I need CUDA10.2, just run module load devel/cuda/10.2
.
So that means the files of CUDA still locate in the system directory, and I don't have the access to modify any of it.
In this case, how can I fix this problem?
More detail about this error:
[usr@*hpc darknet]$ make
chmod +x *.sh
g++ -std=c++11 -std=c++11 -Iinclude/ -I3rdparty/stb/include -DOPENCV `pkg-config --cflags opencv4 2> /dev/null || pkg-config --cflags opencv` -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -I/usr/local/cudnn/include obj/image_opencv.o obj/http_stream.o obj/gemm.o obj/utils.o obj/dark_cuda.o obj/convolutional_layer.o obj/list.o obj/image.o obj/activations.o obj/im2col.o obj/col2im.o obj/blas.o obj/crop_layer.o obj/dropout_layer.o obj/maxpool_layer.o obj/softmax_layer.o obj/data.o obj/matrix.o obj/network.o obj/connected_layer.o obj/cost_layer.o obj/parser.o obj/option_list.o obj/darknet.o obj/detection_layer.o obj/captcha.o obj/route_layer.o obj/writing.o obj/box.o obj/nightmare.o obj/normalization_layer.o obj/avgpool_layer.o obj/coco.o obj/dice.o obj/yolo.o obj/detector.o obj/layer.o obj/compare.o obj/classifier.o obj/local_layer.o obj/swag.o obj/shortcut_layer.o obj/activation_layer.o obj/rnn_layer.o obj/gru_layer.o obj/rnn.o obj/rnn_vid.o obj/crnn_layer.o obj/demo.o obj/tag.o obj/cifar.o obj/go.o obj/batchnorm_layer.o obj/art.o obj/region_layer.o obj/reorg_layer.o obj/reorg_old_layer.o obj/super.o obj/voxel.o obj/tree.o obj/yolo_layer.o obj/gaussian_yolo_layer.o obj/upsample_layer.o obj/lstm_layer.o obj/conv_lstm_layer.o obj/scale_channels_layer.o obj/sam_layer.o obj/convolutional_kernels.o obj/activation_kernels.o obj/im2col_kernels.o obj/col2im_kernels.o obj/blas_kernels.o obj/crop_layer_kernels.o obj/dropout_layer_kernels.o obj/maxpool_layer_kernels.o obj/network_kernels.o obj/avgpool_layer_kernels.o -o darknet -lm -pthread `pkg-config --libs opencv4 2> /dev/null || pkg-config --libs opencv` -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand -L/usr/local/cudnn/lib64 -lcudnn -lstdc++
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
make: *** [darknet] Error 1
Upvotes: 2
Views: 1753
Reputation: 152143
On a machine with a GPU (and driver) installed, the -lcuda
dependency can usually be satisfied because the driver installs the libcuda.so
(or equivalent on windows) in the link search path (typically).
However on a machine with no GPU installed (e.g. a login node or build machine in a cluster) the driver won't be installed and therefore libcuda.so
won't be in the "usual place".
In these situations, "stub" libraries are provided, usually in the /stubs
directory off the CUDA toolkit library install directory (e.g. /usr/local/cuda/lib64
).
Therefore, if you change your Makefile at this line to read:
LDFLAGS+= -L/usr/local/cuda/lib64 -lcudart -lcublas -lcurand -L/usr/local/cuda/lib64/stubs -lcuda
it should allow that library to be located.
Upvotes: 4