Halide CUDA sample app gives strange run times

Question

I have some familiarity with Halide and am starting to learn to use CUDA with it. To start with I ran the halide cuda_mat_mul that comes with Halide source code. I got some reasonable if unimpressive timings:

CPU, autoschedule, Adams2019:    4.2ms
GPU, autoschedule, Anderson2021: 3.0ms
GPU, manual schedule:            1.2ms
CUBLAS:                          0.42ms

Does this seem right? I have an Nvidia RTX 3050 Ti laptop GPU and a core i5-11400h CPU

I then tried to get another sample app: camera_pipe running on the GPU. It comes with schedules for both CPU and GPU. The CMake file is for CPU only. I modified it to do a CUDA build by setting FEATURES cuda cuda_capability_50 and giving it CUDA_INCLUDE_DIRS and CUDA_LIBRARIES just like in the cuda_mat_mul app. I also added output.copy_to_host(); in process.cpp.

I recorded the following run times:

CUDA (manual):  1270ms

cpu auto_schedule Adams19: 10.5ms
cpu manual:                 9.9ms

So CUDA was way slower than CPU.

This was with a single timing iteration. I then tried doing 100 iterations.

CUDA manual 1st iteration: 1270ms
      next 100 iterations:    5.7ms

I then tried 100 iterations on CPU:

cpu manual: 1st iteration: 10.8ms
       next 100 iterations: 5.8ms
cpu auto_schedule (Adams19) 100 iterations:   7ms

Why is the first GPU iteration so slow? Why are subsequent runs almost the same speed on CPU and GPU? I verified that it was generating the correct output image. I also tried setting input.set_host_dirty(); but it made no difference.

I tried auto scheduling on GPU using Anderson2021 but got the following error:

C:\Users\cordo\source\repos\camera_pipe17\out\build\x64-Debug\camera_pipe_auto_schedule.runtime.lib(camera_pipe_auto_schedule.runtime.obj) : error LNK2005: .weak._ZN6Halide7Runtime8Internal13custom_mallocE.default.halide_internal_aligned_alloc already defined in camera_pipe.runtime.lib(camera_pipe.runtime.obj)

There were several more similar looking errors. Thanks

Halide CUDA sample app gives strange run times

Answers (1)

Related Questions