Asur
Asur

Reputation: 4017

How to hash file with multiple algorithms at the same time in PHP?

I would like to hash a given file using multiple algorithms but now I'm doing it sequentially, like this:

return [
    hash_file('md5', $uri),
    hash_file('sha1', $uri),
    hash_file('sha256', $uri)
];

Is there anyway to hash that file opening only one stream and not N where N is the amount of algos I want to use? Something like this:

return hash_file(['md5', 'sha1', 'sha256'], $uri);

Upvotes: 2

Views: 692

Answers (2)

Lawrence Cherone
Lawrence Cherone

Reputation: 46602

You can open a file pointer and then use hash_init() with hash_update() to calculate the hash on the file without opening the file many times, then use hash_final() to get the resulting hash.

<?php
function hash_file_multi($algos = [], $filename) {
    if (!is_array($algos)) {
        throw new \InvalidArgumentException('First argument must be an array');
    }

    if (!is_string($filename)) {
        throw new \InvalidArgumentException('Second argument must be a string');
    }

    if (!file_exists($filename)) {
        throw new \InvalidArgumentException('Second argument, file not found');
    }

    $result = [];
    $fp = fopen($filename, "r");
    if ($fp) {
        // ini hash contexts
        foreach ($algos as $algo) {
            $ctx[$algo] = hash_init($algo);
        }

        // calculate hash
        while (!feof($fp)) {
            $buffer = fgets($fp, 65536);
            foreach ($ctx as $key => $context) {
                hash_update($ctx[$key], $buffer);
            }
        }

        // finalise hash and store in return
        foreach ($algos as $algo) {
            $result[$algo] = hash_final($ctx[$algo]);
        }

        fclose($fp);
    } else {
        throw new \InvalidArgumentException('Could not open file for reading');
    }   
    return $result;
}

$result = hash_file_multi(['md5', 'sha1', 'sha256'], $uri);

var_dump($result['md5'] === hash_file('md5', $uri)); //true
var_dump($result['sha1'] === hash_file('sha1', $uri)); //true
var_dump($result['sha256'] === hash_file('sha256', $uri)); //true

Also posted to PHP manual: http://php.net/manual/en/function.hash-file.php#122549

Upvotes: 6

Ilmari Karonen
Ilmari Karonen

Reputation: 50328

Here's a modification of Lawrence Cherone's solution* that reads the file only once, and works even for non-seekable streams such as STDIN:

<?php
function hash_stream_multi($algos = [], $stream) {
    if (!is_array($algos)) {
        throw new \InvalidArgumentException('First argument must be an array');
    }

    if (!is_resource($stream)) {
        throw new \InvalidArgumentException('Second argument must be a resource');
    }

    $result = [];
    foreach ($algos as $algo) {
        $ctx[$algo] = hash_init($algo);
    }
    while (!feof($stream)) {
        $chunk = fread($stream, 1 << 20);  // read data in 1 MiB chunks
        foreach ($algos as $algo) {
            hash_update($ctx[$algo], $chunk);
        }
    }
    foreach ($algos as $algo) {
        $result[$algo] = hash_final($ctx[$algo]);
    }
    return $result;
}

// test: hash standard input with MD5, SHA-1 and SHA-256
$result = hash_stream_multi(['md5', 'sha1', 'sha256'], STDIN);
print_r($result);

Try it online!

It works by reading the data from the input stream with fread() in chunks (of one megabyte, which should give a reasonable balance between performance and memory use) and feeding the chunks to each hash with hash_update().

*) Lawrence updated his answer while I was writing this, but I feel that mine is still sufficiently distinct to justify keeping both of them. The main differences between this solution and Lawrence's updated version are that my function takes an input stream instead of a filename, and that I'm using fread() instead of fgets() (since for hashing, there's no need to split the input on newlines).

Upvotes: 2

Related Questions