Reputation: 88
Assuming little endian architecture and having a large (unsigned char *)
memory area, I want to be able to interpret any n <= sizeof(size_t)
bytes anywhere in this area as an integer (size_t
) value. I want to do it as fast as possible assuming gcc and x64 architecture, but also I would like to be able to offer safer code for other possible scenarios. What are the possible solutions?
Is it possible to do it faster than the following?
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
switch(len) { /* up to sizeof(size_t) bytes after addr has to be allocated */
case 5: return *(size_t *) addr & 0x0ffffffffffU;
case 4: return *(size_t *) addr & 0x0ffffffffU;
case 3: return *(size_t *) addr & 0x0ffffffU;
case 2: return *(size_t *) addr & 0x0ffffU;
case 1: return *(size_t *) addr & 0x0ffU;
case 6: return *(size_t *) addr & 0x0ffffffffffffU;
case 7: return *(size_t *) addr & 0x0ffffffffffffffU;
case 8: return *(size_t *) addr & 0x0ffffffffffffffffU;
}
return 0;
}
(The order of the branches reflects the actual probability distribution of possible len
values, but in fact it does not seem to have any significant impact: the compiler probably uses some constant time solution.) Moreover, am I right that it is an UB according to the standard, but despite this fact, I can expect from gcc either a "correct" interpretation or, with -fstrict-aliasing -Wstrict-aliasing=2
, a warning or error (because the pointer aliasing is clearly visible to the compiler) if the compiler behavior should happen to change in the future?
A bit slower (I have compared only the whole program) solution is the following:
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
union { size_t num; unsigned char bytes[8]; } number = { 0 };
switch(len) {
case 8: number.bytes[7] = addr[7];
case 7: number.bytes[6] = addr[6];
case 6: number.bytes[5] = addr[5];
case 5: number.bytes[4] = addr[4];
case 4: number.bytes[3] = addr[3];
case 3: number.bytes[2] = addr[2];
case 2: number.bytes[1] = addr[1];
case 1: number.bytes[0] = addr[0];
}
return number.num;
}
Am I right that using this code no alingment problems could arise and that, despite it is still not correct as I write one member of the union and read the other member (see the discussion around https://stackoverflow.com/a/36705613), this "union-based" approach is so widespread that it is supported by almost all compilers? Is there any faster "almost correct" solution?
Finally, is there any faster really correct solution than using shift and add (thanks to Hurkyl for pointing out, of course I tried it but forgot!), which is somewhere between the above and the slowest memcpy?
static inline size_t bytes2num(const unsigned char * const addr, size_t const len) {
size_t num = 0;
int i;
for (i = len - 1; i >= 0; --i) {
num <<= 8;
num |= addr[i];
}
return num;
}
Footnote: I did not mention a particular revision of the standard and moreover I added c++ tag --- I would like my code to be compilable under any standard from C89 onward, thus I'd like to limit myself to the common subset of the standards (possibly with some optional definitions like empty #define inline
etc.)
Upvotes: 3
Views: 99
Reputation: 144951
Your last solution is probably the simplest. But you should initialize num to 0
and use addr
instead of &addr
:
static inline size_t bytes2num(const unsigned char *addr, size_t len) {
size_t num = 0;
memcpy(&num, addr, len);
return num;
}
Note however that if len
is greater than sizeof(num)
, the above code invokes undefined behavior. If you need a safe solution, you need an extra test:
static inline size_t bytes2num(const unsigned char *addr, size_t len) {
size_t num = 0;
memcpy(&num, addr, len <= sizeof(num) ? len : sizeof(num));
return num;
}
Note also that this method assumes integers are stored in little endian order (least significant byte first).
For a portable solution, still assuming little endian byte ordering, eight bit bytes, and len <= sizeof(size_t)
is this loop:
static inline size_t bytes2num(const unsigned char *addr, size_t len) {
size_t num = 0;
for (size_t i = 0; i < len; i++) {
num |= (size_t)addr[i] << (i * 8);
}
return num;
}
If your code uses this function with constant values for len
, it will be expanded inline probably without a loop ant possibly using a single instruction, depending on the compiler's configuration and abilities.
Upvotes: 1