GPU Kernel Blocksize/Gridsize without Threads

Question

I'm currently programming some numerical methods on a gpu via pycuda/cuda and am writing my own kernels. At some point, i need to estimate error for at least 1000 coupled ODE's. I don't want to have to copy a couple of vectors with over 1000 entries, so i created a kernel (at the bottom of the post) that is a basic max function. These %(T)s and %(N)s are string substitutions i'm making at runtime, which should be irrelevant for this question (T represents a complex datatype and N represents the number of coupled ODE's).

My question is: there is no need for parallel computation, so i do not use threads. When I call this function in python, what should I specify to be the blocksize or gridsize?

        __global__ void get_error(double *max_error,%(T)s error_vec[1][%(N)s])
    {
        max_error[0]=error_vec[0][0].real();
        for(int ii=0;ii<%(N)s;ii=ii+1)
        {
            if(max_error[0] < error_vec[0][ii].real())
            {
                max_error[0]=error_vec[0][ii].real();
            }
        }
        return;
    }

GPU Kernel Blocksize/Gridsize without Threads

Answers (1)

Related Questions