Reputation: 377
I am trying to implement a hash function for a simple hash table using the crc32c instruction in sse4.2 x86 extension. However I am not that comfortable with these kinds of problems yet so I have some issues.
I have looked up that there is a function unsigned int _mm_crc32_u8 (unsigned int crc, unsigned char v)
(source: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE4_2&expand=1283 ) which takes unsigned char and returns the hash. According to my understanding the crc
variable is there for error-checking purposes for leading bits which does not concern me (I can either set it to 0 or 0xffffffff and don't care about it).
However I have a string char *s
and want to compute the hash of it.
(in order of importance)
_mm_crc32_u8
to hash the whole string? Should I hash all of the chars in s
separately and bitwise XOR the results? I really have no clue._mm_crc32_u8
takes unsigned chars. Is it OK to just cast unsigned char on char to convert it like (unsigned char)s[0]
in this context? As far as I know it is because ASCII chars only have values 0 to 127 so it does not overflow in the casting process._mm_crc32_u{16,32,64}
), should I perhaps use these for better performance since they can hash up to 4 bytes at once?#include <nmmintrin.h>
provides the functions above. I am compiling it with clang
flag -msse4.2
Upvotes: 0
Views: 834
Reputation: 62583
I think, you misunderstand the way those functions work. They are expected to be called in sequence for every character in the string you need to calculate CRC for (or word if you use larger argument version of it, which you should).
But the idea remains the same: first you init CRC to 0, than you call CRC function in a loop, giving the previous value of CRC in the first argument, and hashed value in the second. You store the result in CRC and keep calm and carry on.
Here is code example:
uint64_t byte_crc32(unsigned int crc, const char** buf, size_t len) {
while (len > 0) {
crc = _mm_crc32_u8(crc, *(const unsigned char*)(*buf));
++*buf;
--len;
}
return crc;
}
uint64_t hw_crc32(unsigned int crc, const char** buf, size_t len) {
while (len > 0) {
crc = _mm_crc32_u16(crc, *(const uint16_t*)(*buf));
*buf += 2;
--len;
}
return crc;
}
uint64_t word_crc32(unsigned int crc, const char** buf, size_t len) {
while (len > 0) {
crc = _mm_crc32_u32(crc, *(const uint32_t*)(*buf));
*buf += 4;
--len;
}
return crc;
}
uint64_t dword_crc32(uint64_t crc, const char** buf, size_t len) {
while (len > 0) {
crc = _mm_crc32_u64(crc, *(const uint64_t*)(*buf));
*buf += 8;
--len;
}
return crc;
}
uint64_t crc(const char* buff, size_t len) {
const size_t dword_chunks = len / 8;
const size_t dword_diff = len % 8;
const size_t word_chunks = dword_diff / 4;
const size_t word_diff = dword_diff % 4;
const size_t hw_chunks = word_diff / 2;
const size_t hw_diff = word_diff % 2;
uint64_t crc = byte_crc32(0, &buff, hw_diff);
crc = hw_crc32(crc, &buff, hw_chunks);
crc = word_crc32(crc, &buff, word_chunks);
crc = dword_crc32(crc, &buff, dword_chunks);
return crc;
}
Upvotes: 2