Niet the Dark Absol
Niet the Dark Absol

Reputation: 324620

Quick bitwise comparison

I have two bytes that are made up of two 4-bit numbers packed together. I need to know if either of the two numbers from the first byte matches either of the numbers from the second byte. Zero is considered null and shouldn't match itself.

Obviously, I can do this by unpacking the numbers and comparing them one by one:

a = 0b10100101;
b = 0b01011111; // should have a match because 0101 is in both

a1 = a>>4; a2 = a&15;
b1 = b>>4; b2 = b&15;

return (a1 != 0 && (a1 == b1 || a1 == b2)) || (a2 != 0 && (a2 == b1 || a2 == b2));

//     ( true   && (  false  ||   false )) || ( true   && (  true   ||   false ))
//     ( true   &&         false         ) || ( true   &&         true          )
//            false                        ||         true
//                                        TRUE

However I'm just wondering if anyone knows of a cleaner way to do this?

Upvotes: 5

Views: 159

Answers (3)

amdn
amdn

Reputation: 11582

Here is a solution in C++ that is more concise and 1.6 times faster. It generates code that is friendlier for high end microprocessors with deep pipelines and complex branch prediction logic. It generates true/false with no comparison/branches and no table lookup (no data cache miss).

A nibble has 4 bits and therefore holds one of 16 values, I map the two nibbles in each of the inputs to an unsigned value (which has at least 16 bits) with bits set in the corresponding bit positions indicating both nibble values present in the input. I then AND the two bitsets, thus computing the intersection of the sets. The last AND discards any matches with nibble 0.

inline unsigned set( unsigned char X ) {
    return (1 << (X & 15)) | (1 << (X >> 4));
}

// Return true if a nibble in 'A' matches a non-null nibble in 'B'
inline bool match( unsigned char A, unsigned char B ) {
    return set( A ) & set( B ) & ~set( 0 );
}

I've timed it on an Intel Xeon X5570 @ 2.93GHz and it is 1.6x faster than the original in the question. Here's the code I used to time it:

#include <time.h>
#include <iostream>

bool original( unsigned char A, unsigned char B ) {
    unsigned char a1 = A >> 4;
    unsigned char a2 = A & 15;
    unsigned char b1 = B >> 4;
    unsigned char b2 = B & 15;

    return (a1 != 0 && (a1 == b1 || a1 == b2)) || (a2 != 0 && (a2 == b1 || a2 == b2));
}

static inline unsigned set( unsigned char X ) {
    return (1 << (X & 15)) | (1 << ((X >> 4)&15));
}

bool faster( unsigned char A, unsigned char B ) {
    return set( A ) & set( B ) & ~set( 0 );
}

class Timer {
    size_t _time;
    size_t & _elapsed;
    size_t nsec() {
        timespec ts;
        clock_gettime( CLOCK_REALTIME, &ts );
        return ts.tv_sec * 1000 * 1000 * 1000 + ts.tv_nsec;
    }
public:
    Timer(size_t & elapsed) : _time(nsec()), _elapsed(elapsed) {}
    ~Timer() { _elapsed = nsec() - _time; }
};

int main()
{
    size_t original_nsec, faster_nsec;
    const size_t iterations = 200000000ULL;
    size_t count = 0;

    { Timer t(original_nsec);
        for(size_t i=0; i < iterations; ++i) {
            count += original( 0xA5 , i & 0xFF );
        }
    }

    { Timer t(faster_nsec);
        for(size_t i=0; i < iterations; ++i) {
            count += faster( 0xA5 , i & 0xFF );
        }
    }

    std::cout << double(original_nsec) / double(faster_nsec)
              << "x faster" << std::endl;

    return count > 0 ? 0 : 1;
}

Here's the output:

$ g++ -o match -O3 match.cpp -lrt && ./match
1.61564x faster
$ 

Upvotes: 0

Keith Randall
Keith Randall

Reputation: 23265

Precompute the answer and store it in a lookup table. The key to your table is 16 bits ((a<<8)+b). It need only be 1 bit output (uses 8K), but you could use 8 bits for simplicity (uses 64K).

Upvotes: 2

paxdiablo
paxdiablo

Reputation: 881223

A cleaner way would be to get rid of that hard-to-parse expression and make the code more readable.

def sameNybble (a, b):
    # Get high and low nybbles.

    ahi = (a >> 4) & 15 ; alo = a & 15;
    bhi = (b >> 4) & 15 ; blo = b & 15;

    # Only check ahi if non-zero, then check against bhi/blo

    if ahi != 0:
        if ahi == bhi or ahi == blo:
            return true

    # Only check alo if non-zero, then check against bhi/blo

    if alo != 0:
        if alo == bhi or alo == blo:
            return true

    # No match

    return false

Any decent optimising compiler will basically give you the same underlying code so it's sometimes better to optimise for readability.

Upvotes: 0

Related Questions