OpenCV FAST corner detection SSE implementation walkthrough

Question

Could someone help me understanding the SSE implementation of the FAST corner detection in OpenCV? I understand the algorithm but not the implementation. Could somebody walk me through the code?

The code is long, so thank you in advance.

I am using OpenCV 2.4.11 and the code goes like this:

__m128i delta = _mm_set1_epi8(-128);
__m128i t = _mm_set1_epi8((char)threshold);
__m128i m0, m1;
__m128i v0 = _mm_loadu_si128((const __m128i*)ptr);

I think the following have something to do with threshold checking, but can't understand the use of delta

__m128i v1 = _mm_xor_si128(_mm_subs_epu8(v0, t), delta);
v0 = _mm_xor_si128(_mm_adds_epu8(v0, t), delta);

Now it checks the neighboring 4 pixels, but again, what is the use of delta?

__m128i x0 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[0])), delta);
__m128i x1 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[4])), delta);
__m128i x2 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[8])), delta);
__m128i x3 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[12])), delta);
m0 = _mm_and_si128(_mm_cmpgt_epi8(x0, v0), _mm_cmpgt_epi8(x1, v0));
m1 = _mm_and_si128(_mm_cmpgt_epi8(v1, x0), _mm_cmpgt_epi8(v1, x1));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x1, v0), _mm_cmpgt_epi8(x2, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x1), _mm_cmpgt_epi8(v1, x2)));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x2, v0), _mm_cmpgt_epi8(x3, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x2), _mm_cmpgt_epi8(v1, x3)));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x3, v0), _mm_cmpgt_epi8(x0, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x3), _mm_cmpgt_epi8(v1, x0)));
m0 = _mm_or_si128(m0, m1);

Here it checks the continuity of the neighboring pixels. (Right?)

int mask = _mm_movemask_epi8(m0);
if( mask == 0 )
    continue;

This is another puzzle for me. Why shifting 8 bytes to the left? I assume the mask tells me the location of the corner candidate, but why 8 bytes?

if( (mask & 255) == 0 )
{
    j -= 8;
    ptr -= 8;
    continue;
}

I gave up at this point...

__m128i c0 = _mm_setzero_si128(), c1 = c0, max0 = c0, max1 = c0;
for( k = 0; k < N; k++ )
{
    __m128i x = _mm_xor_si128(_mm_loadu_si128((const __m128i*)(ptr + pixel[k])), delta);
    m0 = _mm_cmpgt_epi8(x, v0);
    m1 = _mm_cmpgt_epi8(v1, x);

    c0 = _mm_and_si128(_mm_sub_epi8(c0, m0), m0);
    c1 = _mm_and_si128(_mm_sub_epi8(c1, m1), m1);

    max0 = _mm_max_epu8(max0, c0);
    max1 = _mm_max_epu8(max1, c1);
}

max0 = _mm_max_epu8(max0, max1);
int m = _mm_movemask_epi8(_mm_cmpgt_epi8(max0, K16));

for( k = 0; m > 0 && k < 16; k++, m >>= 1 )
    if(m & 1)
    {
        cornerpos[ncorners++] = j+k;
        if(nonmax_suppression)
            curr[j+k] = (uchar)cornerScore(ptr+k, pixel, threshold);
    }

akarsakov · Accepted Answer

As harold said, delta is used to make unsigned comparsion.

Let's describe this implementation by steps:

__m128i x0 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[0])), delta); __m128i x1 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[4])), delta); __m128i x2 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[8])), delta); __m128i x3 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[12])), delta); m0 = _mm_and_si128(_mm_cmpgt_epi8(x0, v0), _mm_cmpgt_epi8(x1, v0)); m1 = _mm_and_si128(_mm_cmpgt_epi8(v1, x0), _mm_cmpgt_epi8(v1, x1)); m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x1, v0), _mm_cmpgt_epi8(x2, v0))); ......

Here it's not checking of 4 neighboring pixels. It checks 4 points, for example, like this: enter image description here

Here they check that "corner condition" is true for this 4 points, because if it's not true there are no 8 neighboring pixels that satisfy "corner condition", so it's not corner pixel. If mask is zero it means that all pixels in vector can't be corner so we shift left for 16 pixels.

int mask = _mm_movemask_epi8(m0);
if( mask == 0 )
    continue;

If mask is not zero, but for first 8 pixels "corner condition" is not true they shift left only for 8 pixels to check remain pixels on next iteration.

if( (mask & 255) == 0 )
{
    j -= 8;
    ptr -= 8;
    continue;
}

And final step. Here they count number of neighboring pixels which are greater than x + threshold to c0 counter and which are less than x - threshold to c1 counter.

Here generating mask for such conditions:

__m128i x = _mm_xor_si128(_mm_loadu_si128((const __m128i*)(ptr + pixel[k])), delta);
m0 = _mm_cmpgt_epi8(x, v0);
m1 = _mm_cmpgt_epi8(v1, x);

Note that if condition is true for element of vector his value set to 0xFF or -1 since we treat him as signed char.

c0 = _mm_and_si128(_mm_sub_epi8(c0, m0), m0); 
c1 = _mm_and_si128(_mm_sub_epi8(c1, m1), m1);

If element of mask is -1 it accumulates to c0 or c1 counter since of substraction (for example c0 - (-1)) . But if it equal to zero they reset counter to zero (_mm_and_si128).

Than they need to store maximum value of counters:

max0 = _mm_max_epu8(max0, c0);
max1 = _mm_max_epu8(max1, c1);

So they store maximum number of neighboring pixels which satisfy "corner condition".

Here they determine which pixels are actually corners and which are not:

max0 = _mm_max_epu8(max0, max1);
int m = _mm_movemask_epi8(_mm_cmpgt_epi8(max0, K16));

for( k = 0; m > 0 && k < 16; k++, m >>= 1 )
    if(m & 1)
    {
        cornerpos[ncorners++] = j+k;
        if(nonmax_suppression)
            curr[j+k] = (uchar)cornerScore(ptr+k, pixel, threshold);
    }

I hope it will help. I'm sorry for my bad English.

OpenCV FAST corner detection SSE implementation walkthrough

Answers (2)

Related Questions