Reputation: 6844
I've never written assembly code for SSE optimization, so sorry if this is a noob question. In this aritcle is explained how to vectorize a for
with a conditional statement. However, my code (taken from here ) is of the form:
for (int j=-halfHeight; j<=halfHeight; ++j)
{
for(int i=-halfWidth; i<=halfWidth; ++i)
{
const float rx = ofsx + j * a12;
const float ry = ofsy + j * a22;
float wx = rx + i * a11;
float wy = ry + i * a21;
const int x = (int) floor(wx);
const int y = (int) floor(wy);
if (x >= 0 && y >= 0 && x < width && y < height)
{
// compute weights
wx -= x; wy -= y;
// bilinear interpolation
*out++ =
(1.0f - wy) * ((1.0f - wx) * im.at<float>(y,x) + wx * im.at<float>(y,x+1)) +
( wy) * ((1.0f - wx) * im.at<float>(y+1,x) + wx * im.at<float>(y+1,x+1));
} else {
*out++ = 0;
}
}
}
So, from my understanding, there are several differences with the linked article:
for
: I've always seen one level for
in vectroization, never seen a nested loopout
index isn't based on i
or j
(so it's not out[i]
or out[j]
): how can I fill out
in this way?In particular I'm confused because for
indexes are always used as array indexes, while here are used to compute variables while the vector is incremented cycle by cycle
I'm using icpc
with -O3 -xCORE-AVX2 -qopt-report=5
and a bunch of others optimization flags. According to Intel Advisor, this is not vectorized, and using #pragma omp simd
generates warning #15552: loop was not vectorized with "simd"
Upvotes: 2
Views: 376
Reputation: 14947
Bilinear interpolation is a rather tricky operation to vectorize, and I wouldn't try it for your first SSE trick. The problem is that the values you need to fetch are not nicely ordered. They're sometimes repeated, sometimes skipped. The good news is, interpolating images is a common operation, and you can likely find a pre-written library to do that, like OpenCV
remap()
is always a good choice. Just build two arrays of wx and wy which represent the fractional source locations of each pixel, and let remap()
do the interpolation.
However, in this case, it looks like an affine transform. That is, the fractional source pixel is related to the source pixel by a 2x3 matrix multiplication. That's the offset and a11/a12/a21/a22 variables. OpenCV has such a transform. Read about it here: http://docs.opencv.org/3.1.0/d4/d61/tutorial_warp_affine.html
All you'll have to do is map your input variables into matrix form and call the affine transform.
Upvotes: 4