Reputation: 47
Let arr
be declared as follows.
std::vector<std::vector<int>> arr(10,000,000);
My code in serial is something like
for (const auto [x, y] : XY) {
arr[x].push_back(y);
}
I use openmp and define an array of locks as follows.
std::vector<omp_lock_t> locks(10,000,000);
With locks
, I use
#pragma omp parallel for schedule(dynamic)
for (const auto [x, y] : XY) {
omp_set_lock(&locks[x]);
arr[x].push_back(y);
omp_unset_lock(&locks[x]);
}
Such an approach works in my machine (windows linux subsystem). But then I find the following post
c/c++ maximum number of mutexes allowed in Linux
which raises the concern whether I have used too many locks, and my program may not work for other platforms (those with a limit on the number of locks allowed).
I wonder if there is another way in which I still have the same control granularity as above and it does not have upper limit on the number allowed. Can I use something like compare and swap?
Upvotes: 1
Views: 459
Reputation: 23517
According to the OpenMP documentation, omp_lock_t
represents "a simple lock". I suppose it is some kind of a spinlock. You, therefore, don't need to care about limits for a number of mutexes. (Mutexes require some interaction with a kernel / scheduler, which is likely a reason for restricting their count.)
omp_lock_t
uses a single memory location, which is fine for a spinlock. For instance, this live demo shows that omp_lock_t
takes only 4 bytes, while std::mutex
40 bytes: https://godbolt.org/z/sr845h8hx.
Note that a spinlock can be implemented with a single byte, even with a single bit (BTS
on x86_64). Therefore, you can compress your locks even much more in memory. However, I'd be careful with such an approach since it can bring significant false sharing issues.
I think your solution is pretty fine. Since the operation in the critical section should be very fast, and there is likely a low probability that multiple threads will access the same element at the same moment, a spinlock is a suitable solution in my opinion.
EDIT
As pointed out in comments, the term simple lock may not mean that the lock is actually a spinlock. It's up to the OpenMP implementation to decide which type of lock it will use. Then, if you want to be sure that you are using a true spinlock (which I still believe is suitable), you can either write your own or use some library that provides one. (Note that the efficient spinlock implementation is not that simple. For instance, it needs to spin on a load/read operation instead of on exchange/test-and-set, which often naive implementations do.)
Upvotes: 2
Reputation: 2869
A couple of possible solutions
If you are on a modern Intel processor that supports Transactional Synchronization Extensions (TSX) you could use a single, speculative lock. (See [Two Small OpenMP Features You May Have Missed][1]). That shows a very similar use case.
You could use a smaller array of locks and use a hash to map from the array index into the lock-set. Something like this (untested and uncompiled!)
// Allocate and initialise the lock array, wherever you have that!
enum { ln2NumLocks = 10,
NumLocks = 1<<ln2NumLocks }
omp_lock_t locks[NumLocks];
for (int i=0; i<NumLocks; i++)
omp_init_lock(&locks[i]);
// Hash from array index to lock index. What you have here
// probably doesn't matter too much. It doesn't need to be
// a crypto-hash!
int mapToLockIdx(int arrayIdx) {
return ((arrayIdx >> ln2NumLocks) ^ arrayIdx) & (numLocks-1);
}
// Your loop is then something like this
#pragma omp parallel for schedule(dynamic)
for (const auto [x, y] : XY) {
auto lock = &locks[mapToLock(x)];
omp_set_lock(lock);
arr[x].push_back(y);
omp_unset_lock(lock);
}
[1]: https://www.openmp.org/wp-content/uploads/SC18-BoothTalks-Cownie.pdf
Upvotes: 0