Burak Aydogan
Burak Aydogan

Reputation: 3

CRC Calculation for 256 bit chunks

I'm using 256 bit variables (__m256i type) in new version of my program on AVX2 and I use Intel intrinsics. Before, 64 bit chunks are used for processing the data. So, _mm_crc32_u64 function is used for CRC calculation.

crc = _mm_crc32_u64(seed,*chunk_64bit);

But now, in order to improve performance I want to calculate CRC for each 256 bit chunks (at least 128 bit chunks) seperately. One way can be like that apply _mm_crc32_u64 in a loop with 64 bit values at each chunks. But I think it is not beneficial in terms of performance.

What is the best method for calculating CRC over 256 bit chunk (or 128 bit) which is faster than _mm_crc32_u64 operation in total ?

Upvotes: 0

Views: 677

Answers (1)

Mark Adler
Mark Adler

Reputation: 112189

You can interleave three crc32 instructions for higher performance. See this answer for code that does that. You can take it a step further by running that code on multiple processors and combining the resulting CRCs.

Upvotes: 1

Related Questions