hashlib - MD5 and Blake2b hashing speed equal, is something wrong?

Question

I wrote a simple test that does the following:

Given a list of top level "root" directories, find all files...

rootDirectories = ["/bin","/etc","/sbin","/lib","/lib32","/lib64"]

def walkDirectory(rootDirectory, ignoreList):
    try:
        directories = dict()
        for dirName, subdirList, fileList in os.walk(rootDirectory):
            if (ignoreList):
                for ignore in ignoreList:
                    if (ignore in fileList):
                        fileList.remove(ignore)
            directories[str(dirName)] = fileList
        return directories
    except Exception as e:
        print("Error while scanning directories and files")

Then generate an MD5 OR Blake2 hash (in blocks, using update())

def generateMd5(directory, blockSize): 
    try:
        hash = hashlib.md5()
        with open(directory,"rb") as f:
            for block in iter(lambda: f.read(blockSize),b""):
                hash.update(block)
                # OR - pass (I/O Only)
            return hash.hexdigest()
    except Exception as e:
        print("Error while taking the hash values" + str(e))

def generateBlake2(directory, blockSize):
    try:
        hash = hashlib.blake2b()
        with open(directory,"rb") as f:
            for block in iter(lambda: f.read(blockSize),b""):
                hash.update(block)
                # OR - pass (I/O Only)
            return hash.hexdigest()
    except Exception as e:
        print("Error while taking the hash values"+ str(e))

Blake2 is said to be ~40% faster on Intel CPUs. However, performance seems similar across a variety of block sizes.

Here's my test results, any averages were taken across 50 runs:

I/O ONLY - Empty hashing function (pass)
Hash:         Block Size (bytes):         Seconds (avg):           Files Hashed (avg):
MD5              4096                1.0658537864685058             15666.0
MD5              8192                0.7631869792938233             15666.0
MD5              16384                0.679033899307251             15666.0
MD5              32768                0.6130096673965454             15666.0
MD5              65536                0.5926639556884765             15666.0
MD5              131072                0.6072390222549439             15666.0
MD5              262144                0.6025748205184936             15666.0
MD5              524288                0.629586148262024             15666.0
MD5              1048576                0.6911558246612549             15666.0
MD5              2097152                0.7710119438171387             15666.0
MD5              4194304                0.7501423931121827             15666.0
MD5              8388608                0.7613127708435059             15666.0
MD5              16777216                0.8549494647979736             15666.0
MD5              33554432                1.370493221282959             15666.0
MD5              67108864                1.3940581130981444             15666.0
Blake2              4096              0.9557385349273682              15666.0
Blake2              8192              0.775569143295288              15666.0
Blake2              16384              0.6793924331665039              15666.0
Blake2              32768              0.6285490798950195              15666.0
Blake2              65536              0.6092999029159546              15666.0
Blake2              131072              0.6079203844070434              15666.0
Blake2              262144              0.6007542181015014              15666.0
Blake2              524288              0.5933445692062378              15666.0
Blake2              1048576              0.5961050319671631              15666.0
Blake2              2097152              0.6041217613220214              15666.0
Blake2              4194304              0.6066651153564453              15666.0
Blake2              8388608              0.6075587749481202              15666.0
Blake2              16777216              0.6483844709396362              15666.0
Blake2              33554432              1.3640736722946167              15666.0
Blake2              67108864              1.3651361989974975              15666.0


I/O + HASH Generation
Hash:         Block Size (bytes):         Seconds (avg):           Files Hashed (avg):
MD5              4096                3.174709644317627             15666.0
MD5              8192                2.9633057641983034             15666.0
MD5              16384                2.8538199186325075             15666.0
MD5              32768                2.8014142322540283             15666.0
MD5              65536                2.7751221895217895             15666.0
MD5              131072                2.7487613582611083             15666.0
MD5              262144                2.760332064628601             15666.0
MD5              524288                2.7834008264541628             15666.0
MD5              1048576                2.8470048809051516             15666.0
MD5              2097152                2.9602468061447142             15666.0
MD5              4194304                2.8882385206222536             15666.0
MD5              8388608                2.9489006996154785             15666.0
MD5              16777216                2.970426073074341             15666.0
MD5              33554432                3.6108012914657595             15666.0
MD5              67108864                3.5666714572906493             15666.0
Blake2              4096              3.3631218385696413              15666.0
Blake2              8192              3.1597275495529176              15666.0
Blake2              16384              3.0446608781814577              15666.0
Blake2              32768              3.0090284061431887              15666.0
Blake2              65536              2.9616037464141844              15666.0
Blake2              131072              2.9736635446548463              15666.0
Blake2              262144              2.959379949569702              15666.0
Blake2              524288              2.972061824798584              15666.0
Blake2              1048576              2.945613770484924              15666.0
Blake2              2097152              2.956157364845276              15666.0
Blake2              4194304              2.9658774614334105              15666.0
Blake2              8388608              2.970878791809082              15666.0
Blake2              16777216              3.064859108924866              15666.0
Blake2              33554432              3.8108808422088623              15666.0
Blake2              67108864              3.7806590032577514              15666.0

Are my results I/O constrained? Why would speed be so similar across the different algorithms and block sizes?

From the above tests, it looks like for the most part, no. Going to explore multi-processing adaptations, as compute appears to be 80% of overall run-time.

cat /proc/cpu

vendor_id       : GenuineIntel
model name      : Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       Micron_5100_MTFDDAK960TBY

hashlib - MD5 and Blake2b hashing speed equal, is something wrong?

Answers (1)

Related Questions