Optimize blockwise bit operations: base-4 numbers

Question

This should be a fun question, at least for me.

My intent is to manipulate base-4 numbers, encoded in a unsigned integer. Each two-bits block then represents a single base-4 digit, starting from the least significant bit:

01 00 11 = base4(301)

I'd like to optimize my code using SSE instructions, because I'm not sure how I scored here, maybe poorly.

The code starts from strings (and uses them to check the correctness), and implements:

convert string to binary
convert binary to string
reverse the number

Any hints are more than welcome!

uint32_t tobin(std::string s)
{
    uint32_t v, bin = 0;

    // Convert to binary
    for (int i = 0; i < s.size(); i++)
    {
        switch (s[i])
        {
            case '0':
                v = 0;
                break;

            case '3':
                v = 3;
                break;

            case '1':
                v = 1;
                break;

            case '2':
                v = 2;
                break;

            default:
                throw "UNKOWN!";
        }

        bin = bin | (v << (i << 1));
    }

    return bin;
}

std::string tostr(int size, const uint32_t v)
{
    std::string b;

    // Convert to binary
    for (int i = 0; i < size; i++)
    {
        uint32_t shl = 0, shr = 0, q;

        shl = (3 << (i << 1));
        shr = i << 1;
        q   = v & shl;
        q   = q >> shr;

        unsigned char c = static_cast(q);

        switch (c)
        {
            case 0:
                b += '0';
                break;

            case 3:
                b += '3';
                break;

            case 1:
                b += '1';
                break;

            case 2:
                b += '2';
                break;

            default:
                throw "UNKOWN!";
        }
    }

    return b;
}

uint32_t revrs(int size, const uint32_t v)
{
    uint32_t bin = 0;

    // Convert to binary
    for (int i = 0; i < size; i++)
    {
        uint32_t shl = 0, shr = 0, q;

        shl = (3 << (i << 1));
        shr = i << 1;
        q   = v & shl;
        q   = q >> shr;

        unsigned char c = static_cast(q);

        shl = (size - i - 1) << 1;

        bin = bin | (c << shl);
    }

    return bin;
}

bool ckrev(std::string s1, std::string s2)
{
    std::reverse(s1.begin(), s1.end());

    return s1 == s2;
}

int main(int argc, char* argv[])
{
    // Binary representation of base-4 number
    uint32_t binr;

    std::vector chk { "123", "2230131" };

    for (const auto &s : chk)
    {
        std::string b, r;
        uint32_t    c;

        binr = tobin(s);
        b    = tostr(s.size(), binr);
        c    = revrs(s.size(), binr);
        r    = tostr(s.size(), c);

        std::cout << "orig " << s << std::endl;
        std::cout << "binr " << std::hex << binr << " string " << b << std::endl;
        std::cout << "revs " << std::hex << c    << " string " << r << std::endl;
        std::cout << ">>> CHK  " << (s == b) << " " << ckrev(r, b) << std::endl;
    }

    return 0;
}

user1196549 · Accepted Answer

This is a little challenging with SSE because there is little provision for bit packing (you want to take two bits from every character and pack them contiguously). Anyway, the special instruction _mm_movemask_epi8 can help you.

For the string-to-binary conversion, you can proceed as follows:

load the 16 characters string (pad with zeroes or clear after the load if necessary);
subtract bytewise ASCII zeroes .
compare bytewise 'unsigned greater than' to a string of 16 '3' bytes; this will set bytes 0xFF wherever there is an invalid character
use _mm_movemask_epi8 to detect such a character in the packed short value

If all is fine, you now need to pack the bit pairs. For this you need to

duplicate the 16 bytes
shift the bits of weight 1 and 2, left by 7 or 6 positions, to make them most significant (_mm_sll_epi16. There is no epi8 version, but bits from one element becoming garbage in the low bits of another element isn't important for this.)
interleave them (_mm_unpack..._epi8, once with lo and once with hi)
store the high bits of those two vectors into shorts with _mm_movemask_epi8.

For the binary-to-string conversion, I can't think of an SSE implementation that makes sense, as there is no counterpart of _mm_movemask_epi8 that would allow you to unpack efficiently.

Optimize blockwise bit operations: base-4 numbers

Answers (2)

General throughts

16-bit granularity

8-bit granularity

P.S.

Related Questions