Reputation: 57
Could someone help me understanding the SSE implementation of the FAST corner detection in OpenCV? I understand the algorithm but not the implementation. Could somebody walk me through the code?
The code is long, so thank you in advance.
I am using OpenCV 2.4.11 and the code goes like this:
__m128i delta = _mm_set1_epi8(-128);
__m128i t = _mm_set1_epi8((char)threshold);
__m128i m0, m1;
__m128i v0 = _mm_loadu_si128((const __m128i*)ptr);
I think the following have something to do with threshold checking, but can't understand the use of delta
__m128i v1 = _mm_xor_si128(_mm_subs_epu8(v0, t), delta);
v0 = _mm_xor_si128(_mm_adds_epu8(v0, t), delta);
Now it checks the neighboring 4 pixels, but again, what is the use of delta?
__m128i x0 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[0])), delta);
__m128i x1 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[4])), delta);
__m128i x2 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[8])), delta);
__m128i x3 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[12])), delta);
m0 = _mm_and_si128(_mm_cmpgt_epi8(x0, v0), _mm_cmpgt_epi8(x1, v0));
m1 = _mm_and_si128(_mm_cmpgt_epi8(v1, x0), _mm_cmpgt_epi8(v1, x1));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x1, v0), _mm_cmpgt_epi8(x2, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x1), _mm_cmpgt_epi8(v1, x2)));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x2, v0), _mm_cmpgt_epi8(x3, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x2), _mm_cmpgt_epi8(v1, x3)));
m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x3, v0), _mm_cmpgt_epi8(x0, v0)));
m1 = _mm_or_si128(m1, _mm_and_si128(_mm_cmpgt_epi8(v1, x3), _mm_cmpgt_epi8(v1, x0)));
m0 = _mm_or_si128(m0, m1);
Here it checks the continuity of the neighboring pixels. (Right?)
int mask = _mm_movemask_epi8(m0);
if( mask == 0 )
continue;
This is another puzzle for me. Why shifting 8 bytes to the left? I assume the mask tells me the location of the corner candidate, but why 8 bytes?
if( (mask & 255) == 0 )
{
j -= 8;
ptr -= 8;
continue;
}
I gave up at this point...
__m128i c0 = _mm_setzero_si128(), c1 = c0, max0 = c0, max1 = c0;
for( k = 0; k < N; k++ )
{
__m128i x = _mm_xor_si128(_mm_loadu_si128((const __m128i*)(ptr + pixel[k])), delta);
m0 = _mm_cmpgt_epi8(x, v0);
m1 = _mm_cmpgt_epi8(v1, x);
c0 = _mm_and_si128(_mm_sub_epi8(c0, m0), m0);
c1 = _mm_and_si128(_mm_sub_epi8(c1, m1), m1);
max0 = _mm_max_epu8(max0, c0);
max1 = _mm_max_epu8(max1, c1);
}
max0 = _mm_max_epu8(max0, max1);
int m = _mm_movemask_epi8(_mm_cmpgt_epi8(max0, K16));
for( k = 0; m > 0 && k < 16; k++, m >>= 1 )
if(m & 1)
{
cornerpos[ncorners++] = j+k;
if(nonmax_suppression)
curr[j+k] = (uchar)cornerScore<patternSize>(ptr+k, pixel, threshold);
}
Upvotes: 4
Views: 705
Reputation: 2154
As harold said, delta is used to make unsigned comparsion.
Let's describe this implementation by steps:
__m128i x0 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[0])), delta); __m128i x1 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[4])), delta); __m128i x2 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[8])), delta); __m128i x3 = _mm_sub_epi8(_mm_loadu_si128((const __m128i*)(ptr + pixel[12])), delta); m0 = _mm_and_si128(_mm_cmpgt_epi8(x0, v0), _mm_cmpgt_epi8(x1, v0)); m1 = _mm_and_si128(_mm_cmpgt_epi8(v1, x0), _mm_cmpgt_epi8(v1, x1)); m0 = _mm_or_si128(m0, _mm_and_si128(_mm_cmpgt_epi8(x1, v0), _mm_cmpgt_epi8(x2, v0))); ......
Here it's not checking of 4 neighboring pixels. It checks 4 points, for example, like this:
int mask = _mm_movemask_epi8(m0); if( mask == 0 ) continue;
if( (mask & 255) == 0 ) { j -= 8; ptr -= 8; continue; }
x + threshold
to c0
counter and which are less than x - threshold
to c1
counter.Here generating mask for such conditions:
__m128i x = _mm_xor_si128(_mm_loadu_si128((const __m128i*)(ptr + pixel[k])), delta); m0 = _mm_cmpgt_epi8(x, v0); m1 = _mm_cmpgt_epi8(v1, x);
Note that if condition is true for element of vector his value set to 0xFF or -1 since we treat him as signed char.
c0 = _mm_and_si128(_mm_sub_epi8(c0, m0), m0); c1 = _mm_and_si128(_mm_sub_epi8(c1, m1), m1);
If element of mask is -1 it accumulates to c0
or c1
counter since of substraction (for example c0 - (-1)
) . But if it equal to zero they reset counter to zero (_mm_and_si128
).
Than they need to store maximum value of counters:
max0 = _mm_max_epu8(max0, c0); max1 = _mm_max_epu8(max1, c1);
So they store maximum number of neighboring pixels which satisfy "corner condition".
Here they determine which pixels are actually corners and which are not:
max0 = _mm_max_epu8(max0, max1);
int m = _mm_movemask_epi8(_mm_cmpgt_epi8(max0, K16));
for( k = 0; m > 0 && k < 16; k++, m >>= 1 )
if(m & 1)
{
cornerpos[ncorners++] = j+k;
if(nonmax_suppression)
curr[j+k] = (uchar)cornerScore<patternSize>(ptr+k, pixel, threshold);
}
I hope it will help. I'm sorry for my bad English.
Upvotes: 3
Reputation: 64903
delta
is a mask in which only the signbits are set. They use it because they want to compare for greater than unsigned, but there is only a signed comparison.
Adding 128 (or subtracting it, because -128 == 128) and xoring with it do the same (if you're working with bytes), because
a + b == (a ^ b) + ((a & b) << 1)
and if b
only has the top bit set, the ((a & b) << 1)
term must be zero (a & b
can have the top bit set, but it's shifted out).
Then as you can see in the diagram below, subtracting 128 "shifts" the entire range down in such a way that a signed comparison will give the same result as an unsigned comparison would have given on the original range.
|0 ... 127 ... 255| unsigned
|-128 ... 0 ... 127| signed
I don't know about the rest, I hope someone else can answer that.
Upvotes: 2