Fast way to find the maximum of a float array in OpenCL

Question

I'm having trouble with the simple task of finding the maximum of an array in OpenCL.

__kernel void ndft(/* lots of stuff*/)
{
    size_t thread_id = get_global_id(0); // thread_id = [0 .. spectrum_size[

    /* MATH MAGIC */

    // Now I have    float spectrum_abs[spectrum_size]    and
    // I want the maximum as well as the index holding the maximum

    barrier();
    // this is the old, sequential code:
    if (*current_max_value < spectrum_abs[i])
    {
        *current_max_value = spectrum_abs[i];
        *current_max_freq = i;
    }
}

Now I could add if (thread_id == 0) and loop through the entire thing as I would do on a single core system, but since performance is a critical issue (otherwise I wouldn't be doing spectrum calculations on a GPU), is there a faster way to do that?

Returning to the CPU at the end of the kernel above is not an option, because the kernel actually continues after that.

Ani · Accepted Answer

You will need to write a parallel reduction. Split your "large" array into small pieces (a size a single workgroup can effectively process) and compute the min-max in each.

Do this iteratively (involves both host and device code) till you are left with only one set of min/max values.

Note that you might need to write a separate kernel that does this unless the current work-distribution works for this piece of the kernel (see my question to you above).

An alternative if your current work distribution is amenable is to find the min max inside of each workgroup and write it to a buffer in global memory (index = local_id). After a barrier(), simply make the kernel running on thread_id == 0 loop across the reduced results and find the max in it. This will not be the optimal solution, but might be one that fits inside your current kernel.

Fast way to find the maximum of a float array in OpenCL

Answers (1)

Related Questions