Reputation: 139
First up, I tried searching for this question before posting (thought people run into it frequently) , but could not find the same. I have multiple images to process and that processing is done across various kernels. For example
md = true;
while(md) {
kernel1<<<...>>>(image1, md);
kernel2<<<...>>>(image1, md); //image1 here is the image modified by kernel1
kernel3<<<...>>>(image1, md); //image1 here is the image modified by kernel2
}
md = true;
while(md) {
kernel1<<<...>>>(imageN, md);
kernel2<<<...>>>(imageN, md); //imageN here is the image modified by kernel1
kernel3<<<...>>>(imageN, md); //imageN here is the image modified by kernel2
}
The processing for a particular image stops when md for that image is set false by any kernel. The number of images are not fixed. I was wondering if I can process the images in parallel using streams? If yes, how will I know when one kernel belonging to a stream has finished and I should invoke the next kernel for that particular image? (Should I put it in an infinite while loop in the host machine). I was thinking of dynamic parallelism, but I am developing for CUDA compute capability 3.0. Thanks a lot for your time.
Edited:According to comment by VAnderi
Upvotes: 1
Views: 1148
Reputation: 5570
I think you can use CUDA streams for this task but it should pay off if you have multiple images.
For example you can create 2 streams, one that processes odd numbered images and one that processes even numbered images. In each stream you "enqueue" kernel1, kernel2 and kernel3 and this way you can control that kernel 2 waits kernel 1 and so on. See this presentation.
The stream behaves like a queue. If you push the kernels into the stream, they will run in the order you enqueued them. See this post for more information.
I don't recommend putting kernel 1, 2, 3 on different streams since it makes the situation worse.
Regarding dynamic parallelism this is more to overlap memory copies with kernels working on another data set. You could squeeze more performance out of this if you copy the next set of images while processing the current one in the kernels.
Upvotes: 1