Dmitry Zaytsev
Dmitry Zaytsev

Reputation: 23972

Need help optimizing function call

I'm trying to implement an image processing function. Here it is:

typedef void (*AgFilter)(int*, int*, int*, float*);

static void filter(AndroidBitmapInfo* info, void* pixels, AgFilter func, void* params){

    for(y = 0; y < height; y++){
        for(x = 0; x < width; x++){
            //initizalie r, g, b

            func(&r, &g, &b, params); //here is the problem
        }
    }
}

I'm passing this function as func:

static inline void brightness(int *r, int *g, int *b, float* param){
    float add = param[0];

    *r += add;
    *g += add;
    *b += add;
}

Problem that it's works extremly slow. Well, I can understand that. But if instead of passing function by reference I write my function directly inside the filter (instread of func calling) it's works much much faster. Where is the problem?

P.S. note that it's not c++

EDIT

This one works fast:

static void filter(AndroidBitmapInfo* info, void* pixels, int add){

    for(y = 0; y < height; y++){
        for(x = 0; x < width; x++){
            //initizalie r, g, b
            r += add;
            g += add;
            b += add;
        }
    }
}

Upvotes: 3

Views: 148

Answers (3)

Graham Borland
Graham Borland

Reputation: 60711

By far the biggest improvement you can make is to avoid calling the function once for each pixel. It is trivial to move your loop inside the brightness function.

static inline void brightness(int *r, int *g, int *b, float* param){
    float add = param[0];

    for(y = 0; y < height; y++)
        for(x = 0; x < width; x++){
            //initialize r, g, b
            *r += add;
            *g += add;
            *b += add;
        }
}

Now, I know you don't want to have to duplicate the loop-iteration code inside every different filter function you might write, so this is one of the cases where using macros can really make a difference. Try something like this (untested).

#define FOR_EACH_PIXEL for(y = 0; y < height; y++) \
                       for(x = 0; x < width;  x++)

static inline void brightness(int *r, int *g, int *b, float* param){
    float add = param[0];

    FOR_EACH_PIXEL 
    {
            //initialize r, g, b
            *r += add;
            *g += add;
            *b += add;
    }

}

Upvotes: 1

Aaron Digulla
Aaron Digulla

Reputation: 328790

Calling functions takes time. Usually, you don't notice but you call that function a million times (about two million times for a full HD 1920x1080 image). Modern cameras create 16 Megapixel images. If each call takes 1 us, the accumulated time for calling the function (without actually executing the body) will be 16 seconds.

How can you make it faster? Some suggestions:

  1. Instead of passing four parameters, use a struct:

     struct data { int r,g,b; float* param; }
    

    allocate this once and reuse it. Now you can call func with a single argument.

  2. Memory layout might be a problem. param is anywhere in memory. Copy it into struct data instead:

     struct data { int r,g,b, add; }
    

    The reason for this is that param is anywhere in memory which means it's probably in a different cache line. If you can fit all the data into a single 64 byte structure, all will fit into a single cache line which can give a huge performance boost.

    But probably not in your case since you always access param[0]. This is more an issue when you would access the array in a random way.

  3. Swap shift and bit mask operations:

     r = (int) ((line[x] & >> 16 ) & 0xFF);
    

    Can give a small boost since all three colors will now be masked with 0xFF and that allows the compiler to move the constant once to a CPU register.

  4. When calling functions, all the CPU registers need to be "saved/restored". That costs time. When the function is inlined, the compiler knows which CPU registers is trashed and can optimize accordingly.

    Actually, the CPU registers aren't saved (at least I haven't seen that for a long time). Modern compilers just assumes that after calling the function, all of them have been changed.

  5. Note that inline has no effect since you pass the function by reference instead of directly calling it.

  6. Use threads. This is dead simple to parallelize: Run the function N times (one per CPU core) on 1/N-th of the data. That will give you roughly a N time performance boost.

Upvotes: 4

smbear
smbear

Reputation: 1041

I think that the problem is because you are passing your function as pointer. Because of that brightness() is not inlined by the compiler.

When you copy definition of brightness() into the filter() function, you get your desired result - you inline the function.

Upvotes: 4

Related Questions