Reputation: 629
I want to calculate the MD5 and SHA checksum of a series of huge files.
Each file is about 1GB
, so I wish to be as fast as possible.
Could anyone help to recommend some efficient C++ library?
BTW,
When reading file, fread( buffer, sizeof(char), BUFFER_SIZE, fin )
, what size of BUFFER_SIZE
is reasonable?
Upvotes: 1
Views: 7413
Reputation: 3991
On top of my head I do not know any fast C++ library. Computing a hash is relative straightforward, so any C library will be as easy to use (you can easily wrap it in a C++ class yourself). I found the following site where a guy implemented several hashing algorithms in x86 assembly and compared them to "official" C implementations of the same algorithms:
https://www.nayuki.io/page/fast-sha1-hash-implementation-in-x86-assembly
https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly
Those implementations should be a good starting point and then you just have to make the file I/O as efficient as possible. Memory-mapped I/O is usually very efficient, or then you could go complex and use two threads: one thread reads chunks from the file and the other thread hashes the read data. The idea here would be to always keep the process doing something useful, i.e. hashes can be calculated while waiting for more data to be read from the file.
Upvotes: 2
Reputation: 129524
I personally would do FILE *pipe = popen("md5sum filename");
[or something to that effect] - it is likely to be as fast as anything else, since 1GB of a file will take a little while to read, and the calculation is unlikely to be using much of your CPU time - most of the time will be waiting for the disk to load up the file.
On my system, I created 6 files of 1GB each, and it takes 2 seconds to checksum the file with md5sum. (12 seconds for all 6 files).
Upvotes: 2
Reputation: 316
You could use Openssl. Search for Mysticial answer about MD5 large file How to create a md5 hash of a string in C? When you look into Openssl SHA docs you will see that MD5 and SHA ways of using these functions are the same. SHA Openssl Docs
Upvotes: 2