Reputation: 1119
I am solving minimal dominant set problem on CUDA. Every thread finds some local candiate result and I need to find the best. I am using __device__
variables for the global result (dev_bestConfig
and dev_bestValue
).
I need to do something like this:
__device__ configType dev_bestConfig = 0;
__device__ int dev_bestValue = INT_MAX;
__device__ void findMinimalDominantSet(int count, const int *matrix, Lock &lock)
{
// here is some algorithm that finds local bestValue and bestConfig
// set device variables
if (bestValue < dev_bestValue)
{
dev_bestValue = bestValue;
dev_bestConfig = bestConfig;
}
}
I know that this does not work because more threads accesses the memory at the same time so I use this critical section:
// set device variables
bool isSet = false;
do
{
if (isSet = atomicCAS(lock.mutex, 0, 1) == 0)
{
// critical section goes here
if (bestValue < dev_bestValue)
{
dev_bestValue = bestValue;
dev_bestConfig = bestConfig;
}
}
if (isSet)
{
*lock.mutex = 0;
}
} while (!isSet);
This actually works as expected but it is really slow. For example without this critical section it takes 0.1 secodns and with this critical section it takes 1.8 seconds.
What can i do differetly to make it faster?
Upvotes: 0
Views: 126
Reputation: 1119
I actually avoided any critical sections and locking at the end. I saved local results to an array and then searched for the best one. The searching can be done sequentially or by parallel reduction.
Upvotes: 1