Reputation: 11
The following function compute the square matrix arrayrmsd
, but the result is different depending on the value of export OMP_NUM_THREADS=x
(x
= number of cores used). Differences are low but significant and cannot be due to numerical approximations (IMHO).
The arrayrmsd
is a matrix of nmol
x nmol
elements. Note: arrayrmsd[i,j] == arrayrmsd[i x nmol + j]
.
What I wanted to do is to run in parallel the optimize()
function, which is in the internal for loop. This function compares the molecular structure mol1c
with all the structures contained in arrcar[]
and store the numerical difference in arrayrmsd[i,j]
.
Because the first parameter of the optimize()
function is modified, I need to have an array of mol_omp[0..numthr]
that should be used exclusively by the assigned thread.
The idea here is to run in parallel only the for loop that contains the optimize()
function (the code after #pragma omp for
).
Before the #pragma omp parallel
I initialized the corresponding mol_omp[i]
for all the threads.
I am new to OpenMP, and I cannot understand where my code is wrong:
void create_rmsd_matrix(double *arrayrmsd, int nmol, CRISTALLO *mol1c, CRISTALLO *arrcar, double tolerance) {
int i, k, numthr;
double resol;
CRISTALLO *mol_omp;
for (i=0; i < nmol; i++) for (k=0; k < nmol; k++) arrayrmsd[i * nmol + k] = (double) 0.0;
numthr = omp_get_max_threads();
mol_omp = (CRISTALLO *) malloc(sizeof(*mol_omp) * numthr);
for (i=0; i < numthr; i++) {
(mol_omp + i)->natoms = arrcar->natoms;
(mol_omp + i)->num_symbols = arrcar->num_symbols;
initmol(mol_omp + i);
}
#pragma omp parallel private(i,k) shared(nmol,arrayrmsd,arrcar,tolerance)
{
int tid = omp_get_thread_num();
#pragma omp for private(i,k) schedule(static,1)
for (i=0; i < nmol; i++) {
molcpy(mol_omp + tid, arrcar + i);
printf("Processing struc N. %d by thread n %d\n", i+1, tid);
for (k = i + 1; k < nmol; k++) {
arrayrmsd[i * nmol + k] = optimize(mol_omp + tid, arrcar + k, tolerance);
printf("[ %5d , %5d ] = %.12lf (thread %d)\n", i+1, k+1, arrayrmsd[i * nmol + k], tid);
}
}
}
for (i=0; i < numthr; i++) {
free((mol_omp + i)->list_atomi);
free((mol_omp + i)->num_atomi);
free((mol_omp + i)->x);
free((mol_omp + i)->y);
free((mol_omp + i)->z);
}
}
There is a similar question, but I cannot understand whether it can be applied to my case. Here the main difference is that I have a dynamic array that every thread must use independently.
Probably the problem reside in data race, but I cannot figure out where, because all the element of the arrayrmsd
can be computed independently.
Where I am wrong? Thank you!
I tried several #pragma omp parallel
options, but honestly I'm lacking of knowledge in OpenMP.
The serial code and the parallel code run with OMP_NUM_THREADS=1
gave identical results, but if OMP_NUM_THREADS>1
the content of the arrayrmsd
matrix differs.
Edit:
arrcar
array is not modified.
molcpy(mol1, mol2)
simply copy the molecular structure mol2
into mol1
:
void molcpy(CRISTALLO *mol1, CRISTALLO *mol2) {
int i;
mol1->natoms = mol2->natoms;
mol1->tipo_coordinate = mol2->tipo_coordinate;
strcpy(mol1->comment, mol2->comment);
for (i=0; i < 3; i++) {
mol1->v1[i] = mol2->v1[i];
mol1->v2[i] = mol2->v2[i];
mol1->v3[i] = mol2->v3[i];
}
mol1->scale = mol2->scale;
mol1->num_symbols = mol2->num_symbols;
for (i=0; i < mol1->num_symbols; i++) strcpy(&mol1->list_atomi[ i * 3 ], &mol2->list_atomi[ i * 3 ]);
for (i=0; i < mol2->num_symbols; i++) mol1->num_atomi[i] = mol2->num_atomi[i];
for (i=0; i < mol2->natoms; i++) {
mol1->x[i] = mol2->x[i];
mol1->y[i] = mol2->y[i];
mol1->z[i] = mol2->z[i];
}
}
UPDATE!: UPDATE!: UPDATE!:
For the readers: The code I posted above is correct and the #pragma
directives are correct. As suspected I found a very subtle data race
bug in one of the many functions that optimize
calls. Sorry guys for that, and thanks to everybody for comments. It pushed me to find the solution. I'm sorry I cannot post all the code: it is huge.
Upvotes: 1
Views: 104