Reputation: 5691
I am a bit confused about synchronization.
__syncthreads
you can synchronize threads in a block.This,
(the use of __syncthreads
) must be done only with shared memory? Or
using shared memory with __syncthreads
has best performance?Upvotes: 0
Views: 2252
Reputation: 5930
The use of __syncthreads
does not involve shared memory, it only ensures synchronization within a block. But you need to synchronize threads when you want them to share data through shared memory.
We don't always use shared memory because it is quite small, and because it can slow down your application when badly used. This is due to potential bank conflicts when badly addressing shared memory. Moreover, recent architectures (from 2.0) implement shared memory in the same hardware area than cache. Thus, some seasoned CUDA developers recommend not to use shared memory and rely on the cache mechanisms only.
Can be. If you want to know whether it is a deadlock, try to increase the number of blocks you're using. If it is a deadlock, your GPU should freeze. If it is not, post your code, it will be easier for us to answer ;)
Upvotes: 2
Reputation: 4115
Although shared memory
and __syncthreads()
are independent concepts, but they often go hand in hand. Otherwise if threads operate independently, there is no need to use __syncthreads()
.
Two aspects restrict the use of shared memory
: 1). the size of shared memory
is limited 2). to achieve best performance, you need to avoid bank conflict when using shared memory
.
It could be due to the lack of __syncthreads()
. Sometimes, using shared memory
without __syncthreads()
could lead to unpredictable results.
Upvotes: 0
Reputation: 579
__syncthreads() and shared memory are independent ideas, you don't need one to use the other. The only requirement for using __syncthreads() that comes to my mind is that all the threads must eventually arrive at the point in the code, otherwise your program will simply hang.
As for shared memory, yes it's probably a matter of size that you don't see it being used all the time. From my understanding shared memory is split amongst all blocks. For example, to launch a kernel using a shared memory of 1kb with a 100 blocks will require 100kb which exceeds what is available on the SM.
Upvotes: 0