Yvain
Yvain

Reputation: 29

recursion, global variables, and OpenMP

I'm having troubles with a function that calls itself the proper way of threading with openmp is unclear to me on that case.

long *arrayofindex; 
int length, N;

void gen(long index)
{
    if(index == 0)
    {
        #pragma omp parallel for
        for(int a=0; a<N; ++a)
        {
            #pragma omp critical
            {
            gen(index+1);
            ++arrayofindex[index];
            }
        }
    }
    else
    {
        for(arrayofindex[index]=0; arrayofindex[index]<N; ++arrayofindex[index])
        {
            if(index < length-1)
                gen(index+1);
            else printf("%ld\n", arrayofindex[index]);
        }
    }
}

int main(){
    length = 5, N = 4;
    arrayofindex = (long*) malloc(length * sizeof(long));
    for(int i=0; i<length; ++i)
        arrayofindex[index] = 0;
    gen(0);
}

the way I did it, it runs several threads and is correct but doesn't seem to get any faster than without openmp support.

Is there an way through this ? or is it going to be helpless because of the way of the code.

The complete code https://github.com/e2002e/zhou

Upvotes: 0

Views: 282

Answers (1)

John Bollinger
John Bollinger

Reputation: 181084

As I understand the code presented, the general objective is essentially to generate all the N-letter words that can be formed with a length-letter alphabet. Each recursive call to gen() corresponds to one letter position, and so each time control reaches the bottom of the recursion, the first N elements of arrayofindex represent the letters of one word.

But it should be obvious that multiple threads running in parallel cannot use the same arrayofindex. Each thread expects when it reaches the bottom of the recursion to find in arrayofindex the values that it set along its recursive path. That's fundamental to the approach. If other threads are modifying arrayofindex at the same time then they all likely get mishmashes of values set by different threads. Moreover, you probably don't get anything like the speedup you hope for, because the threads need to synchronize their accesses to arrayofindex.


Note: This problem has nothing in particular to do with recursion. You would have exactly the same issue if you modified the code to be iterative instead of recursive -- which I, myself, would in fact do if I were looking to improve performance, though I do not demonstrate that here.


There are various ways to give each OMP thread its own working array. If the space must continue to be dynamically allocated, then you should arrange for it to be allocated inside the parallel region, so that each thread allocates its own. If you're willing and able to rely on variable-length arrays for this, however, then probably the only other thing you need is an OMP private clause attached to your parallel for construct.

For example, this variation on your code works for me:

void gen_tail(int length, int num_letters, int arrayofindex[], int position) {
    for (int letter = 0; letter < num_letters; letter++) {
        arrayofindex[position] = letter;
        if (position + 1 < length) {
            gen_tail(length, num_letters, arrayofindex, position + 1);
        } else {
            // this is the bottom ... do something with arrayofindex, such as:
            #pragma omp critical
            {
                for (int i = 0; i < length; i++) {
                    putchar('A' + arrayofindex[i]);
                }
                putchar('\n');
            }
        }
    }
}

void gen(int length, int num_letters) {
    assert(length > 1);
    int arrayofindex[length];  // Note: VLA, _not_ dynamically allocated

    // Marking the array 'private' means each thread gets its own copy.
    // This would not have the needed effect if 'arrayofindex' were a pointer.
    #pragma omp parallel for private(arrayofindex)
    for (int letter = 0; letter < num_letters; letter++) {
        arrayofindex[0] = letter;
        gen_tail(length, num_letters, arrayofindex, 1);
    }
}

int main(void) {
    gen(5, 4);
}

That emits the expected 1024 (== 45) results for me, all distinct, as I have every reason to expect it should do.

Upvotes: 2

Related Questions