Using SpringBoot with Nvidia GPU (CUDA)

Question

I am working on how to offload some workload to GPU using CUDA in SpringBoot project. To help me explain my question better, let me suppose that we want to implement a REST API to do matrix-vector multiplication in SpringBoot application. We need to load some matrices with various sizes to GPU's memory on application launch, then accept user's request with vector data and find the corresponding matrix inside GPU to do matrix-vector multiplication, and finally return the multiplication result to user. We have already implemented the kernel using JCuda.

In this scenario, we want to process users' requests concurrently, so there are several questions I am interested in:

How to avoid CUDA out of memory error when there are lots of REST API calls?
If we use explicit cuda streams to improve application's throughput, how to determine the number of cuda streams?
If we also need to do CUD operations to matrices in GPU's memory while processing REST API calls, how to make these operations and matrix-vector multiplication operations atomic?

Using SpringBoot with Nvidia GPU (CUDA)

Answers (0)

Related Questions