Reputation: 1

Will the results of Fast CRC using PCLMULQDQ be the same as for the naive implementation without further parameters?

As far as I understand, there's multiple ways to compute CRC depending on the polynomial of course but also on whether the data is bit-reflected, the initial value, or the final xor value. Let's say the "naive" implementation doesn't work on bit-reflected data and sets these values to 0.

Now, the algorithm to compute CRC for generic polynomials using PCLMULQDQ as described in https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf doesn't do all that as well, so I was wondering, will the results be the same as for the naive implementation? And if not, is there a way to check the correctness of my implementation for Fast CRC? I don't seem to find any online CRC calculators or any implementations for this specific algorithm.

Upvotes: 0

Answers (2)

rcgldr

Reputation: 28818

I converted Intel code from a github site to work with Visual Studio, added comments to the constants used for "folding" and Barrett reduction (comments are missing for some constants in the Intel source, with no explanation of how they were generated), and added a program to generate the constants. There are 6 sets of code: for 16, 32, and 64 bit non-reflected and reflected. For each of the 6 combinations, crcxxxa.asm is assembly code that uses PCLMULQDQ for folding data and to calculate the CRC for "Barrett reduction", crcxxxc.cpp is the test code and uses a table driven C++ CRC to verify the fast CRC, and crcxxxg.cpp is the code used to generate the constants used in crcxxxa.asm.

I'm using XMM registers and PCLMULQDQ, which takes two 64 bit operands and produces a 127 bit (not 128) product, since there are no carries to go into bit 127. Since it's a carryless multiply the only consequence of using it for reflected operands is the reflected product is effectively multiplied by 2, since bit 127, which is logical bit 0 for a reflected product is 0. This is taken care of by adjusting the constants.

For CRC64, the CRC polynomial and it's inverse are 65 bits. The constants in the code don't include the x^64 bit. To handle product = (A * B), where B is a 65 bit operand (stored in memory without the x^64 bit), product = (A << 64) XOR (A * (B & 0xFFFFFFFFFFFFFFFF)).

The assembly code takes an initial CRC as an input parameter, but does not include an xorout parameter. The post xor operation can be done after calling the assembly code, or the assembly code could be updated to take an xorout parameter (which would be in R9).

https://github.com/jeffareid/crc

Upvotes: 1

Mark Adler

Reputation: 112189

You can construct a CRC calculator using PCLMULQDQ that matches any definition of a CRC, reflected or not, initial or final conditions or not.

The paper gives examples for several different CRCs (the IEEE 802.3 degree-32 polynomial, both forward and reflected, and a degree-16 and degree-64 polynomial). So I don't know what you mean by "this specific algorithm".

You mention "my implementation". Perhaps you can post the result of applying your implementation to the nine bytes "123456789" (digits in ASCII).

Upvotes: 2

Will the results of Fast CRC using PCLMULQDQ be the same as for the naive implementation without further parameters?

Answers (2)

Related Questions