Krcn U
Krcn U

Reputation: 421

Update a hash map inside OpenMP for loop

I am trying to use openmp for a for loop in which i am trying to insert/update a hash map (std::unordered_map)

Hash map and keys are actually members of a class so I assigned pointers to represent their addresses.Key is also a hash value returned by a global function.

The following seems the easiest way of doing this but the hash map is not updated correctly. Something(s) is/are wrong but i am not sure how to fix.Thanks in advance.

void MyClass::ProcessBuffer(void)
{
    omp_set_num_threads(4);
    std::unordered_map<unsigned long long,unsigned int>* hashptr=&m_sequencehash;
    std::vector<std::string>* bufferptr=&m_buffer;
    unsigned int sizevec=m_kmer_size;
    size_t i;
    #pragma omp parallel for
    for (i=0; i<READSTR_BUF_SIZE;++i)
    {
        ++(*hashptr)[_hash((*bufferptr)[i],sizevec)];
    }

}

Upvotes: 4

Views: 1708

Answers (1)

Liran Funaro
Liran Funaro

Reputation: 2878

The easiest way to solve this is creating a new map for each thread, then reducing them to a single map sequentially. This is a classic map-reduce scenario.

int s = omp_get_num_threads();
std::unordered_map<unsigned long long,unsigned int> res[s];

// Map step
#pragma omp parallel for
for (i=0; i<READSTR_BUF_SIZE;++i)
{
    int t = omp_get_thread_num();
    res[t][_hash((*bufferptr)[i],sizevec)]++;
}

// Reduce step
for (int i=0; i < s; i++) {
    for (auto r : res[s]) {
        (*hashptr)[r.first] += r.second;
    }
}

Performing the reduction concurrently might be dangerous because you will still have to access the same map concurrently. If you don't know the implementation of the map, you can't know for sure that this is safe.

Alternatively, you can partition the hash values between different maps by putting different hash intervals in different buckets. This will prevent the different threads from accessing the same hash value in the reduce step. However, on a small dataset, it is hard to find a good partition function with a small number of buckets. Using too many buckets might have significant overhead compared to the serial approch.

Upvotes: 4

Related Questions