Curious
Curious

Reputation: 557

Copy bits of uint64_t into two uint64_t at specific location

I have an input uint64_t X and number of its N least significant bits that I want to write into the target Y, Z uint64_t values starting from bit index M in the Z. Unaffected parts of Y and Z should not be changed. How I can implement it efficiently in C++ for the latest intel CPUs?

It should be efficient for execution in loops. I guess that it requires to have no branching: the number of used instructions is expected to be constant and as small as possible.

M and N are not fixed at compile time. M can take any value from 0 to 63 (target offset in Z), N is in the range from 0 to 64 (number of bits to copy).

illustration:

illustration

Upvotes: 0

Views: 257

Answers (1)

Aki Suihkonen
Aki Suihkonen

Reputation: 20087

There's at least a four instruction sequence available on reasonable modern IA processors.

X &= (1 << (N+1)) - 1;     // mask off the upper bits
//  bzhi    rax, rdi, rdx

Z = X << M;                
//  shlx    rax, rax, rsi

Y = X >> (64 - M);         
//  neg     sil
//  shrx    rax, rax, rsi

The value M=0 causes a bit of pain, as Y would need to be zero in that case and also the expression N >> (64-M) would need sanitation.

One possibility to overcome this is

x = bzhi(x, n);
y = rol(x,m);
y = bzhi(y, m);   // y &= ~(~0ull << m);
z = shlx(x, m);   // z = x << m;

As OP actually wants to update the bits, one obvious solution would be to replicate the logic for masks:

xm = bzhi(~0ull, n);
ym = rol(xm, m);
ym = bzhi(ym, m);
zm = shlx(xm, m);

However, clang seems to produce something like 24 instructions total with the masks applied:

Y = (Y & ~xm) | y; // |,+,^  all possible
Z = (Z & ~zm) | z;

It is likely then better to change the approach:

x2 = x << (64-N);  // align xm to left
y2 = y >> y_shift; // align y to right
y = shld(y2,x2, y_shift); // y fixed

Here y_shift = max(0, M+N-64)

Fixing Z is slightly more involved, as Z can be combined of three parts:

zzzzz.....zzzzXXXXXXXzzzzzz, where m=6, n=7

That should be doable with two double shifts as above.

Upvotes: 2

Related Questions