Reputation: 421
I am trying to use openmp for a for
loop in which i am trying to insert/update a hash map (std::unordered_map
)
Hash map and keys are actually members of a class so I assigned pointers to represent their addresses.Key is also a hash value returned by a global function.
The following seems the easiest way of doing this but the hash map is not updated correctly. Something(s) is/are wrong but i am not sure how to fix.Thanks in advance.
void MyClass::ProcessBuffer(void)
{
omp_set_num_threads(4);
std::unordered_map<unsigned long long,unsigned int>* hashptr=&m_sequencehash;
std::vector<std::string>* bufferptr=&m_buffer;
unsigned int sizevec=m_kmer_size;
size_t i;
#pragma omp parallel for
for (i=0; i<READSTR_BUF_SIZE;++i)
{
++(*hashptr)[_hash((*bufferptr)[i],sizevec)];
}
}
Upvotes: 4
Views: 1708
Reputation: 2878
The easiest way to solve this is creating a new map for each thread, then reducing them to a single map sequentially. This is a classic map-reduce scenario.
int s = omp_get_num_threads();
std::unordered_map<unsigned long long,unsigned int> res[s];
// Map step
#pragma omp parallel for
for (i=0; i<READSTR_BUF_SIZE;++i)
{
int t = omp_get_thread_num();
res[t][_hash((*bufferptr)[i],sizevec)]++;
}
// Reduce step
for (int i=0; i < s; i++) {
for (auto r : res[s]) {
(*hashptr)[r.first] += r.second;
}
}
Performing the reduction concurrently might be dangerous because you will still have to access the same map concurrently. If you don't know the implementation of the map, you can't know for sure that this is safe.
Alternatively, you can partition the hash values between different maps by putting different hash intervals in different buckets. This will prevent the different threads from accessing the same hash value in the reduce step. However, on a small dataset, it is hard to find a good partition function with a small number of buckets. Using too many buckets might have significant overhead compared to the serial approch.
Upvotes: 4