Reputation: 97
Is it possible to prevent a memory address being accessed by other threads for some period? for example:
__global__ void func(int* a){
// other computation
__lock_address(a);
a[0] += threadIdx.x;
__unlock_address(a);
}
the first thread that finished the other computations
and reached __lock_address
will lock that memory address untill _unlock_address
is called, any other threads that reached __lock_address
will have to wait until the first thread unlocks it.
The above example is basically equivalent to atomicAdd
, but what if I want to do more complicated computation rather than a simple addition?
Edit: mutex in initialized to 0, a is initialized to -1
__global__ void func(int *a, int *mutex){
a[0] = atomicCAS(mutex, 0, 1); // a[0] = 1
}
if I do this, a[0] is equal to 1. but it should be 0 since that is the old value of mutex.
__global__ void func(int *a, int *mutex){
a[0] = mutex[0]; // a[0] = 0
}
This is a sanity check, value at a[0] is 0 now. which means mutex is initialized to 0 correctly.
Upvotes: 0
Views: 238
Reputation: 97
Ok, I figured out what's wrong. essentially only the first thread in thread block is getting the old value 0, while simultinuously setting mutex to 1, other threads read mutex after mutex is set to 1 by first thread, then stucking in deadlock. I found this solution that worked for me.
Upvotes: 0
Reputation: 557
You can use mutex to protect multithreaded access to the memory region. Cuda Programming Guide has a nice example of using atomic operations to implement it (https://docs.nvidia.com/cuda/cuda-c-programming-guide/#scheduling-example)
__device__ void mutex_lock(unsigned int *mutex) {
unsigned int ns = 8;
while (atomicCAS(mutex, 0, 1) == 1) {
__nanosleep(ns);
if (ns < 256) {
ns *= 2;
}
}
}
__device__ void mutex_unlock(unsigned int *mutex) {
atomicExch(mutex, 0);
}
Upvotes: 3