Efficiently compute two dissimilar numbers in arm neon

Question

I have an array of 16 integers and I'd like to find pair of ints from this array that have max dissimilarity between each other. dissimilarity could be computed with this (pseudo) code:

int diss(uint32_t x, uint32_t y)
{   // it could do square for each byte of the number instead.
    return
    abs(((x >> 24) & 0xFF) - ((y >> 24) & 0xFF)) + 
    abs(((x >> 16) & 0xFF) - ((y >> 16) & 0xFF)) + 
    abs(((x >>  8) & 0xFF) - ((y >>  8) & 0xFF)) + 
    abs(((x >>  0) & 0xFF) - ((y >>  0) & 0xFF));
}

void findDissimilar(uint32_t buf[16], uint32_t& x, uint32_t& y)
{
    int maxDiss = 0;
    for (int i=0; i<16; ++i)
    {
        for (int j=0; j<16; ++j)
        {
            int d = diss(buf[i], bud[j]);
            if (d > maxDiss)
            {
                maxDiss = d;
                x = buf[i];
                y = buf[j];
            }
        }
    }
}

On input buf is already in neon registers if that matters. On output I should get two ints (in neon reg perhaps it's better). How can I do that efficiently in arm neon, what approaches should I try? Just to clarify, the point of the question is about optimizing findDissimilar.

Efficiently compute two dissimilar numbers in arm neon

Answers (1)

Related Questions