Reputation: 765
I have a simple question. I am allocating n block of memory associated to a unique cuda_stream like (simplify),[It may be a very bad idea -_-]:
void *ptr = NULL;
cudaStream_t stream;
cudaMallocManaged(&ptr, size);
cudaStreamAttachMemAsync(stream, ptr);
Later in my code I am calling my kernel with 6 of this block of memory (determined by a random process). The cuda launcher takes only one stream argument
update_gpu<<<256, 256,0,???>>>(block1,block2,block3,block4,block5,block6);
??? should be a stream but which one should I pass ? I may synchronize with
cudaDeviceSynchronize()
but it may be too much, as I have a lot of block
cudaStreamSynchronize(...)
look like a solution, should I do it for five of my stream ?
Any suggestions ?
best,
++t
Upvotes: 0
Views: 106
Reputation: 7245
Attaching memory to a specific stream is an optimization, that tells the runtime that this memory does not need to be visible to any other than the specific stream. If in doubt, just don't attach the memory to any stream, and it will be visible to all kernels. This is particularly the right approach if you don't use streams at all (which means all kernels are launched to the default stream).
If however you want to take advantage of this optimization, by the time your kernel runs all managed memory must be either not attached at all, or attached to the specific stream that the kernel is launched to.
Upvotes: 1