Reputation: 65006
Given a long
with bytes WXYZ
(where each letter is a byte), I would like some fast bit twiddling code that will create two longs
with the same bytes as the original, but interleaved with the 0 byte.
For example, given the long
with value ABCDEFGH
(each letter being one byte), produce the two longs:
0A0B0C0D
0E0F0G0H
Something equivalent to, but faster than:
long result1 = expand((int)(input >>> 32));
long result2 = expand((int)input);
long expand(int inputInt) {
long input = intputInt;
return
(input & 0x000000FF) |
(input & 0x0000FF00) << 8 |
(input & 0x00FF0000) << 16 |
(input & 0xFF000000) << 24;
}
Upvotes: 1
Views: 1171
Reputation: 106391
The following is about 25% faster for me (Java 7, benchmarked using Google Caliper), YMMV may vary according to your compiler of course:
long a = (input | (input << 16));
long result = (a & 0xFF000000FFL) + ((a & 0xFF000000FF00L) <<8);
The idea is to use a bit of extra parallelism vs. the original approach.
The first line is a neat trick that produces garbage in bits 17-32, but you don't care as you are going to mask it out anyway. :-)
Upvotes: 3
Reputation: 3852
In C++ you could try to use a union:
typedef union
{
char bytes[8];
long value;
} PlatformSpecificSolution;
long expand(int valueInt)
{
PlatformSpecificSolution pss;
pss.value = valueInt;
pss.bytes[6] = pss.bytes[3]; pss.bytes[3] = 0;
pss.bytes[4] = pss.bytes[2]; pss.bytes[2] = 0;
pss.bytes[2] = pss.bytes[1]; pss.bytes[1] = 0;
// pss.bytes[0] = pss.bytes[0];
return pss.value;
}
I have no idea if this is faster (you will have to run benchmarks on the platforms that you want to support). This solution is definitely more error prone. You should always ask yourself, if the performance benefit outways the disadvantage of less maintainable code.
Upvotes: 0
Reputation: 1601
long expand(int inputInt) {
long input = intputInt;
return
(input & 0x000000FF) << 8 |
(input & 0x0000FF00) << 16 |
(input & 0x00FF0000) << 24 |
(input & 0xFF000000) << 32;
}
Upvotes: 0