Reputation: 23972
I'm trying to implement an image processing function. Here it is:
typedef void (*AgFilter)(int*, int*, int*, float*);
static void filter(AndroidBitmapInfo* info, void* pixels, AgFilter func, void* params){
for(y = 0; y < height; y++){
for(x = 0; x < width; x++){
//initizalie r, g, b
func(&r, &g, &b, params); //here is the problem
}
}
}
I'm passing this function as func
:
static inline void brightness(int *r, int *g, int *b, float* param){
float add = param[0];
*r += add;
*g += add;
*b += add;
}
Problem that it's works extremly slow. Well, I can understand that. But if instead of passing function by reference I write my function directly inside the filter
(instread of func
calling) it's works much much faster. Where is the problem?
P.S. note that it's not c++
EDIT
This one works fast:
static void filter(AndroidBitmapInfo* info, void* pixels, int add){
for(y = 0; y < height; y++){
for(x = 0; x < width; x++){
//initizalie r, g, b
r += add;
g += add;
b += add;
}
}
}
Upvotes: 3
Views: 148
Reputation: 60711
By far the biggest improvement you can make is to avoid calling the function once for each pixel. It is trivial to move your loop inside the brightness
function.
static inline void brightness(int *r, int *g, int *b, float* param){
float add = param[0];
for(y = 0; y < height; y++)
for(x = 0; x < width; x++){
//initialize r, g, b
*r += add;
*g += add;
*b += add;
}
}
Now, I know you don't want to have to duplicate the loop-iteration code inside every different filter function you might write, so this is one of the cases where using macros can really make a difference. Try something like this (untested).
#define FOR_EACH_PIXEL for(y = 0; y < height; y++) \
for(x = 0; x < width; x++)
static inline void brightness(int *r, int *g, int *b, float* param){
float add = param[0];
FOR_EACH_PIXEL
{
//initialize r, g, b
*r += add;
*g += add;
*b += add;
}
}
Upvotes: 1
Reputation: 328790
Calling functions takes time. Usually, you don't notice but you call that function a million times (about two million times for a full HD 1920x1080 image). Modern cameras create 16 Megapixel images. If each call takes 1 us, the accumulated time for calling the function (without actually executing the body) will be 16 seconds.
How can you make it faster? Some suggestions:
Instead of passing four parameters, use a struct:
struct data { int r,g,b; float* param; }
allocate this once and reuse it. Now you can call func
with a single argument.
Memory layout might be a problem. param
is anywhere in memory. Copy it into struct data
instead:
struct data { int r,g,b, add; }
The reason for this is that param
is anywhere in memory which means it's probably in a different cache line. If you can fit all the data into a single 64 byte structure, all will fit into a single cache line which can give a huge performance boost.
But probably not in your case since you always access param[0]
. This is more an issue when you would access the array in a random way.
Swap shift and bit mask operations:
r = (int) ((line[x] & >> 16 ) & 0xFF);
Can give a small boost since all three colors will now be masked with 0xFF
and that allows the compiler to move the constant once to a CPU register.
When calling functions, all the CPU registers need to be "saved/restored". That costs time. When the function is inlined, the compiler knows which CPU registers is trashed and can optimize accordingly.
Actually, the CPU registers aren't saved (at least I haven't seen that for a long time). Modern compilers just assumes that after calling the function, all of them have been changed.
Note that inline
has no effect since you pass the function by reference instead of directly calling it.
Use threads. This is dead simple to parallelize: Run the function N times (one per CPU core) on 1/N-th of the data. That will give you roughly a N time performance boost.
Upvotes: 4
Reputation: 1041
I think that the problem is because you are passing your function as pointer. Because of that brightness() is not inlined by the compiler.
When you copy definition of brightness() into the filter() function, you get your desired result - you inline the function.
Upvotes: 4