Reputation: 4528
I have come across this CRC32 code and was curious why the author would choose to use
crc = crc ^ ~0U;
instead of
crc = ~crc;
As far as I can tell, they are equivalent.
I have even disassembled the two versions in Visual Studio 2010.
Not optimized build:
crc = crc ^ ~0U;
009D13F4 mov eax,dword ptr [crc]
009D13F7 xor eax,0FFFFFFFFh
009D13FA mov dword ptr [crc],eax
crc = ~crc;
011C13F4 mov eax,dword ptr [crc]
011C13F7 not eax
011C13F9 mov dword ptr [crc],eax
I also cannot justify the code by thinking about the number of cycles that each instruction takes since both should be taking 1 cycle to complete. In fact, the xor might have a penalty by having to load the literal from somewhere, though I am not certain of this.
So I'm left thinking that it is possibly just a preferred way to describe the algorithm, rather than an optimization... Would that be correct?
Edit 1:
Since I just realized that the type of the crc
variable is probably important to mention I am including the whole code (less the lookup table, way too big) here so you don't have to follow the link.
uint32_t crc32(uint32_t crc, const void *buf, size_t size)
{
const uint8_t *p;
p = buf;
crc = crc ^ ~0U;
while (size--)
{
crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
}
return crc ^ ~0U;
}
Edit 2:
Since someone has brought up the fact that an optimized build would be of interest, I have made one and included it below.
Optimized build:
Do note that the whole function (included in the last edit below) was inlined.
// crc = crc ^ ~0U;
zeroCrc = 0;
zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
00971148 mov ecx,14h
0097114D lea edx,[ebp-40h]
00971150 or eax,0FFFFFFFFh
00971153 movzx esi,byte ptr [edx]
00971156 xor esi,eax
00971158 and esi,0FFh
0097115E shr eax,8
00971161 xor eax,dword ptr ___defaultmatherr+4 (973018h)[esi*4]
00971168 add edx,ebx
0097116A sub ecx,ebx
0097116C jne main+153h (971153h)
0097116E not eax
00971170 mov ebx,eax
// crc = ~crc;
zeroCrc = 0;
zeroCrc = crc32(zeroCrc, zeroBufferSmall, sizeof(zeroBufferSmall));
01251148 mov ecx,14h
0125114D lea edx,[ebp-40h]
01251150 or eax,0FFFFFFFFh
01251153 movzx esi,byte ptr [edx]
01251156 xor esi,eax
01251158 and esi,0FFh
0125115E shr eax,8
01251161 xor eax,dword ptr ___defaultmatherr+4 (1253018h)[esi*4]
01251168 add edx,ebx
0125116A sub ecx,ebx
0125116C jne main+153h (1251153h)
0125116E not eax
01251170 mov ebx,eax
Upvotes: 11
Views: 1894
Reputation: 141648
Something nobody's mentioned yet; if this code is being compiled on a machine with 16 bit unsigned int
then these two code snippets are different.
crc
is specified as a 32-bit unsigned integral type. ~crc
will invert all bits, but if unsigned int
is 16bit then crc = crc ^ ~0U
will only invert the lower 16 bits.
I don't know enough about the CRC algorithm to know whether this is intentional or a bug, perhaps hivert can clarify; although looking at the sample code posted by OP, it certainly does make a difference to the loop that follows.
NB. Sorry for posting this as an "answer" because it isn't an answer, but it's too big to just fit in a comment :)
Upvotes: 10
Reputation: 10667
The reason is the following: There is a lot of variant of CRC. Each one depend on a Z/Z2 polynomial which is used for an euclidian division. Usually is it implemented using the algorithm described In this paper by Aram Perez. Now depending on the polynomial you are using, there is a final XOR at the end of the algorithm which depend on the polynomial whose goal is to eliminate some corner case. It happens that for CRC32 this is the same as a global not but this is not true for all CRC. As an evidence on This web page you can read (emphasis mine):
Consider a message that begins with some number of zero bits. The remainder will never contain anything other than zero until the first one in the message is shifted into it. That's a dangerous situation, since packets beginning with one or more zeros may be completely legitimate and a dropped or added zero would not be noticed by the CRC. (In some applications, even a packet of all zeros may be legitimate!) The simple way to eliminate this weakness is to start with a nonzero remainder. The parameter called initial remainder tells you what value to use for a particular CRC standard. And only one small change is required to the crcSlow() and crcFast() functions:
crc remainder = INITIAL_REMAINDER;
The final XOR value exists for a similar reason. To implement this capability, simply change the value that's returned by crcSlow() and crcFast() as follows:
return (remainder ^ FINAL_XOR_VALUE);
If the final XOR value consists of all ones (as it does in the CRC-32 standard), this extra step will have the same effect as complementing the final remainder. However, implementing it this way allows any possible value to be used in your specific application.
Upvotes: 7
Reputation: 11831
Just to add my own guess to the mix, x ^ 0x0001
keeps the last bit and flipps the others; to turn off the last bit use x & 0xFFFE
or x & ~0x0001
; to turn on the last bit unconditionally use x | 0x0001
. I.e., if you are doing lots of bit-twiddling, your fingers probably know those idioms and just roll them out without much thinking.
Upvotes: 1
Reputation: 57784
I think it is for the same reason that some write
const int zero = 0;
and others write
const int zero = 0x00000000;
Different people think different ways. Even about a fundamental operation.
Upvotes: 0
Reputation: 18974
I doubt there's any deep reason. Maybe that's how the author thought about it ("I'll just xor with all ones"), or perhaps how it was expressed in the algorithm definition.
Upvotes: 0