mrgloom
mrgloom

Reputation: 21602

How to speed up bilinear interpolation of image?

I'm trying to rotate image with interpolation, but it's too slow for real time for big images.

the code something like:

for(int y=0;y<dst_h;++y)
{
    for(int x=0;x<dst_w;++x)
    {
        //do inverse transform
        fPoint pt(Transform(Point(x, y)));

        //in coor of src
        int x1= (int)floor(pt.x);
        int y1= (int)floor(pt.y);
        int x2= x1+1;
        int y2= y1+1;


        if((x1>=0&&x1<src_w&&y1>=0&&y1<src_h)&&(x2>=0&&x2<src_w&&y2>=0&&y2<src_h))
        {
                Mask[y][x]= 1; //show pixel

                float dx1= pt.x-x1;
                float dx2= 1-dx1;
                float dy1= pt.y-y1;
                float dy2= 1-dy1;

                //bilinear
                pd[x].blue= (dy2*(ps[y1*src_w+x1].blue*dx2+ps[y1*src_w+x2].blue*dx1)+
                        dy1*(ps[y2*src_w+x1].blue*dx2+ps[y2*src_w+x2].blue*dx1));
                pd[x].green= (dy2*(ps[y1*src_w+x1].green*dx2+ps[y1*src_w+x2].green*dx1)+
                        dy1*(ps[y2*src_w+x1].green*dx2+ps[y2*src_w+x2].green*dx1));
                pd[x].red= (dy2*(ps[y1*src_w+x1].red*dx2+ps[y1*src_w+x2].red*dx1)+
                        dy1*(ps[y2*src_w+x1].red*dx2+ps[y2*src_w+x2].red*dx1));

                //nearest neighbour
                //pd[x]= ps[((int)pt.y)*src_w+(int)pt.x];
        }
        else
                Mask[y][x]= 0; //transparent pixel
    }
    pd+= dst_w;
}

How I can speed up this code, I try to parallelize this code but it seems there is no speed up because of memory access pattern (?).

Upvotes: 1

Views: 8096

Answers (2)

thealmightygrant
thealmightygrant

Reputation: 701

The key is to do most of your computations as ints. The only thing that is necessary to do as a float is the weighting. See here for a good resource.

From that same resource:

int px = (int)x; // floor of x
int py = (int)y; // floor of y
const int stride = img->width;
const Pixel* p0 = img->data + px + py * stride; // pointer to first pixel

// load the four neighboring pixels
const Pixel& p1 = p0[0 + 0 * stride];
const Pixel& p2 = p0[1 + 0 * stride];
const Pixel& p3 = p0[0 + 1 * stride];
const Pixel& p4 = p0[1 + 1 * stride];

// Calculate the weights for each pixel
float fx = x - px;
float fy = y - py;
float fx1 = 1.0f - fx;
float fy1 = 1.0f - fy;

int w1 = fx1 * fy1 * 256.0f;
int w2 = fx  * fy1 * 256.0f;
int w3 = fx1 * fy  * 256.0f;
int w4 = fx  * fy  * 256.0f;

// Calculate the weighted sum of pixels (for each color channel)
int outr = p1.r * w1 + p2.r * w2 + p3.r * w3 + p4.r * w4;
int outg = p1.g * w1 + p2.g * w2 + p3.g * w3 + p4.g * w4;
int outb = p1.b * w1 + p2.b * w2 + p3.b * w3 + p4.b * w4;
int outa = p1.a * w1 + p2.a * w2 + p3.a * w3 + p4.a * w4;

Upvotes: 5

Spektre
Spektre

Reputation: 51835

wow you are doing a lot inside most inner loop like:

1.float to int conversions

  • can do all on floats ...
  • they are these days pretty fast
  • the conversion is what is killing you
  • also you are mixing float and ints together (if i see it right) which is the same ...

2.transform(x,y)

  • any unnecessary call makes heap trashing and slow things down
  • instead add 2 variables xx,yy and interpolate them insde your for loops

3.if ....

  • why to heck are you adding if ?
  • limit the for ranges before loop and not inside ...
  • the background can be filled with other fors before or later

Upvotes: 1

Related Questions