Reputation: 4017
I would like to hash a given file using multiple algorithms but now I'm doing it sequentially, like this:
return [
hash_file('md5', $uri),
hash_file('sha1', $uri),
hash_file('sha256', $uri)
];
Is there anyway to hash that file opening only one stream and not N where N is the amount of algos I want to use? Something like this:
return hash_file(['md5', 'sha1', 'sha256'], $uri);
Upvotes: 2
Views: 692
Reputation: 46602
You can open a file pointer and then use hash_init() with hash_update() to calculate the hash on the file without opening the file many times, then use hash_final() to get the resulting hash.
<?php
function hash_file_multi($algos = [], $filename) {
if (!is_array($algos)) {
throw new \InvalidArgumentException('First argument must be an array');
}
if (!is_string($filename)) {
throw new \InvalidArgumentException('Second argument must be a string');
}
if (!file_exists($filename)) {
throw new \InvalidArgumentException('Second argument, file not found');
}
$result = [];
$fp = fopen($filename, "r");
if ($fp) {
// ini hash contexts
foreach ($algos as $algo) {
$ctx[$algo] = hash_init($algo);
}
// calculate hash
while (!feof($fp)) {
$buffer = fgets($fp, 65536);
foreach ($ctx as $key => $context) {
hash_update($ctx[$key], $buffer);
}
}
// finalise hash and store in return
foreach ($algos as $algo) {
$result[$algo] = hash_final($ctx[$algo]);
}
fclose($fp);
} else {
throw new \InvalidArgumentException('Could not open file for reading');
}
return $result;
}
$result = hash_file_multi(['md5', 'sha1', 'sha256'], $uri);
var_dump($result['md5'] === hash_file('md5', $uri)); //true
var_dump($result['sha1'] === hash_file('sha1', $uri)); //true
var_dump($result['sha256'] === hash_file('sha256', $uri)); //true
Also posted to PHP manual: http://php.net/manual/en/function.hash-file.php#122549
Upvotes: 6
Reputation: 50328
Here's a modification of Lawrence Cherone's solution* that reads the file only once, and works even for non-seekable streams such as STDIN
:
<?php
function hash_stream_multi($algos = [], $stream) {
if (!is_array($algos)) {
throw new \InvalidArgumentException('First argument must be an array');
}
if (!is_resource($stream)) {
throw new \InvalidArgumentException('Second argument must be a resource');
}
$result = [];
foreach ($algos as $algo) {
$ctx[$algo] = hash_init($algo);
}
while (!feof($stream)) {
$chunk = fread($stream, 1 << 20); // read data in 1 MiB chunks
foreach ($algos as $algo) {
hash_update($ctx[$algo], $chunk);
}
}
foreach ($algos as $algo) {
$result[$algo] = hash_final($ctx[$algo]);
}
return $result;
}
// test: hash standard input with MD5, SHA-1 and SHA-256
$result = hash_stream_multi(['md5', 'sha1', 'sha256'], STDIN);
print_r($result);
It works by reading the data from the input stream with fread()
in chunks (of one megabyte, which should give a reasonable balance between performance and memory use) and feeding the chunks to each hash with hash_update()
.
*) Lawrence updated his answer while I was writing this, but I feel that mine is still sufficiently distinct to justify keeping both of them. The main differences between this solution and Lawrence's updated version are that my function takes an input stream instead of a filename, and that I'm using fread()
instead of fgets()
(since for hashing, there's no need to split the input on newlines).
Upvotes: 2