Reputation: 4103
Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?
Upvotes: 130
Views: 197905
Reputation: 3361
You can calculate md5 on the browser using hash-wasm
. You need to read the file by chunk, allowing your browser to support big files. The following implementation produce accurate md5 of a 2Gb file in < 6s (i think this might be the fastest & simple possible design).
async function computeFileMD5(file, notifyprogress = () => {}) {
const CHUNK_SIZE = 10 * 1024 * 1024;
const hasher = await hashwasm.createMD5();
let offset = 0;
const totalSize = file.size;
const reader = new FileReader();
while (offset < totalSize) {
const slice = file.slice(offset, offset + CHUNK_SIZE);
const arrayBuffer = await new Promise((resolve, reject) => {
reader.onload = e => resolve(e.target.result);
reader.onerror = e => reject(e.target.error);
reader.readAsArrayBuffer(slice);
});
hasher.update(new Uint8Array(arrayBuffer));
offset += CHUNK_SIZE;
notifyprogress(offset / totalSize);
}
return hasher.digest();
}
function updateProgressBar(ratio) {
const percentage = Math.min(ratio* 100, 100);
const progressBarFill = document.getElementById('progressBarFill');
const progressText = document.getElementById('progressText');
progressBarFill.style.width = percentage + '%';
progressText.textContent = `Progress : ${percentage.toFixed(2)}%`;
}
async function onFileSelected(event) {
const file = event.target.files[0];
if (!file) return;
updateProgressBar(0);
document.getElementById('md5Result').textContent = 'Computing...';
const start = Date.now();
const md5Hash = await computeFileMD5(file, updateProgressBar);
const end = Date.now();
const duration = end - start;
const fileSizeMB = file.size / 1024 / 1024;
const throughput = fileSizeMB / (duration / 1000);
document.getElementById('md5Result').innerHTML = `
Hash: ${md5Hash}<br>
Duration: ${duration} ms<br>
Throughput: ${throughput.toFixed(2)} MB/s
`;
}
document.addEventListener('DOMContentLoaded', () => {
const fileInput = document.getElementById('fileInput');
fileInput.addEventListener('change', onFileSelected, false);
});
#progressBar {
width: 300px;
height: 20px;
border: 1px solid #aaa;
margin-top: 10px;
position: relative;
}
#progressBarFill {
background: #4caf50;
width: 0;
height: 100%;
transition: width 0.2s;
}
<script src="https://cdn.jsdelivr.net/npm/hash-wasm@4/dist/md5.umd.min.js"></script>
<h1>MD5 client side compute</h1>
<input type="file" id="fileInput" />
<div id="progressBar">
<div id="progressBarFill"></div>
</div>
<p id="progressText">Progression : 0%</p>
<pre id="md5Result"></pre>
Upvotes: 1
Reputation: 4485
Here is my refactored version of @Brio's hashwasm
answer:
async function file_sha1(file){ return await _hashwasm_from_file(file, 'SHA1') }
async function file_md5(file){ return await _hashwasm_from_file(file, 'MD5') }
const FILE_CHUNK_SIZE = 64 * 1024 * 1024;
async function _hash_chunk(fileReader, hasher, chunk) {
return new Promise((resolve, reject) => {
fileReader.onload = async(e) => {
const view = new Uint8Array(e.target.result);
hasher.update(view);
resolve();
};
fileReader.readAsArrayBuffer(chunk);
});
}
async function _hashwasm_from_file(file, algo){
const fileReader = new FileReader();
let hashwasm_creator = hashwasm[`create${algo}`];
if (!hashwasm_creator) throw new Error(`hashwasm api create${algo}() not found`);
let hasher = await hashwasm_creator();
const chunkNumber = Math.floor(file.size / FILE_CHUNK_SIZE);
for (let i = 0; i <= chunkNumber; i++) {
const chunk = file.slice(
FILE_CHUNK_SIZE * i,
Math.min(FILE_CHUNK_SIZE * (i + 1), file.size)
);
await _hash_chunk(fileReader, hasher, chunk);
}
const hash = hasher.digest();
return Promise.resolve(hash);
};
Usage:
Simply call await file_sha1(<file>)
to get the sha1 hash for <file>
.
Upvotes: 0
Reputation: 2806
This is another hash-wasm example, but using the streams API, instead of having to set FileReader
:
async function calculateSHA1(file: File) {
const hasher = await createSHA1()
const hasherStream = new WritableStream<Uint8Array>({
start: () => {
hasher.init()
// you can set UI state here also
},
write: chunk => {
hasher.update(chunk)
// you can set UI state here also
},
close: () => {
// you can set UI state here also
},
})
await file.stream().pipeTo(hasherStream)
return hasher.digest('hex')
}
Upvotes: 0
Reputation: 14863
If sha256 is also fine:
async sha256(file: File) {
// get byte array of file
let buffer = await file.arrayBuffer();
// hash the message
const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
// convert ArrayBuffer to Array
const hashArray = Array.from(new Uint8Array(hashBuffer));
// convert bytes to hex string
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}
Upvotes: 10
Reputation: 300845
While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.
I wrote that in 2009. So what about new browsers?
With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input>
element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:
How?
See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS
Upvotes: 107
Reputation: 534
The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.
It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.
const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;
function hashChunk(chunk) {
return new Promise((resolve, reject) => {
fileReader.onload = async(e) => {
const view = new Uint8Array(e.target.result);
hasher.update(view);
resolve();
};
fileReader.readAsArrayBuffer(chunk);
});
}
const readFile = async(file) => {
if (hasher) {
hasher.init();
} else {
hasher = await hashwasm.createMD5();
}
const chunkNumber = Math.floor(file.size / chunkSize);
for (let i = 0; i <= chunkNumber; i++) {
const chunk = file.slice(
chunkSize * i,
Math.min(chunkSize * (i + 1), file.size)
);
await hashChunk(chunk);
}
const hash = hasher.digest();
return Promise.resolve(hash);
};
const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");
fileSelector.addEventListener("change", async(event) => {
const file = event.target.files[0];
resultElement.innerHTML = "Loading...";
const start = Date.now();
const hash = await readFile(file);
const end = Date.now();
const duration = end - start;
const fileSizeMB = file.size / 1024 / 1024;
const throughput = fileSizeMB / (duration / 1000);
resultElement.innerHTML = `
Hash: ${hash}<br>
Duration: ${duration} ms<br>
Throughput: ${throughput.toFixed(2)} MB/s
`;
});
<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->
<input type="file" id="file-input">
<div id="result"></div>
Upvotes: 30
Reputation: 735
hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5
import SparkMD5 from 'spark-md5';
// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;
/**
* Incrementally calculate checksum of a given file based on MD5 algorithm
*/
export const checksum = (file) =>
new Promise((resolve, reject) => {
let currentChunk = 0;
const chunks = Math.ceil(file.size / CHUCK_SIZE);
const blobSlice =
File.prototype.slice ||
File.prototype.mozSlice ||
File.prototype.webkitSlice;
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
const loadNext = () => {
const start = currentChunk * CHUCK_SIZE;
const end =
start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;
// Selectively read the file and only store part of it in memory.
// This allows client-side applications to process huge files without the need for huge memory
fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
};
fileReader.onload = e => {
spark.append(e.target.result);
currentChunk++;
if (currentChunk < chunks) loadNext();
else resolve(spark.end());
};
fileReader.onerror = () => {
return reject('Calculating file checksum failed');
};
loadNext();
});
Upvotes: 3
Reputation: 54812
it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:
var holder = document.getElementById('holder');
holder.ondragover = function() {
return false;
};
holder.ondragend = function() {
return false;
};
holder.ondrop = function(event) {
event.preventDefault();
var file = event.dataTransfer.files[0];
var reader = new FileReader();
reader.onload = function(event) {
var binary = event.target.result;
var md5 = CryptoJS.MD5(binary).toString();
console.log(md5);
};
reader.readAsBinaryString(file);
};
I recommend to add some CSS to see the Drag & Drop area:
#holder {
border: 10px dashed #ccc;
width: 300px;
height: 300px;
}
#holder.hover {
border: 10px dashed #333;
}
More about the Drag & Drop functionality can be found here: File API & FileReader
I tested the sample in Google Chrome Version 32.
Upvotes: 36
Reputation: 34207
spark-md5
and Q
Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)
function calculateMD5Hash(file, bufferSize) {
var def = Q.defer();
var fileReader = new FileReader();
var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
var hashAlgorithm = new SparkMD5();
var totalParts = Math.ceil(file.size / bufferSize);
var currentPart = 0;
var startTime = new Date().getTime();
fileReader.onload = function(e) {
currentPart += 1;
def.notify({
currentPart: currentPart,
totalParts: totalParts
});
var buffer = e.target.result;
hashAlgorithm.appendBinary(buffer);
if (currentPart < totalParts) {
processNextPart();
return;
}
def.resolve({
hashResult: hashAlgorithm.end(),
duration: new Date().getTime() - startTime
});
};
fileReader.onerror = function(e) {
def.reject(e);
};
function processNextPart() {
var start = currentPart * bufferSize;
var end = Math.min(start + bufferSize, file.size);
fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
}
processNextPart();
return def.promise;
}
function calculate() {
var input = document.getElementById('file');
if (!input.files.length) {
return;
}
var file = input.files[0];
var bufferSize = Math.pow(1024, 2) * 10; // 10MB
calculateMD5Hash(file, bufferSize).then(
function(result) {
// Success
console.log(result);
},
function(err) {
// There was an error,
},
function(progress) {
// We get notified of the progress as it is executed
console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
});
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>
<div>
<input type="file" id="file"/>
<input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>
Upvotes: 15
Reputation: 339
To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.
I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.
Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme
Upvotes: 3
Reputation: 619
I've made a library that implements incremental md5 in order to hash large files efficiently. Basically you read a file in chunks (to keep memory low) and hash it incrementally. You got basic usage and examples in the readme.
Be aware that you need HTML5 FileAPI, so be sure to check for it. There is a full example in the test folder.
https://github.com/satazor/SparkMD5
Upvotes: 34
Reputation: 2597
You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9. Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.
Code would look something like this:
HTML:
<input type="file" id="file-dialog" multiple="true" accept="image/*">
JS (w JQuery)
$("#file-dialog").change(function() {
handleFiles(this.files);
});
function handleFiles(files) {
for (var i=0; i<files.length; i++) {
var reader = new FileReader();
reader.onload = function() {
var md5 = binl_md5(reader.result, reader.result.length);
console.log("MD5 is " + md5);
};
reader.onerror = function() {
console.error("Could not read the file");
};
reader.readAsBinaryString(files.item(i));
}
}
Upvotes: 8
Reputation: 71
Apart from the impossibility to get file system access in JS, I would not put any trust at all in a client-generated checksum. So generating the checksum on the server is mandatory in any case. – Tomalak Apr 20 '09 at 14:05
Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.
Upvotes: 7
Reputation: 2130
I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.
You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.
Upvotes: -5
Reputation: 40235
There is a couple scripts out there on the internet to create an MD5 Hash.
The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html
Although, I don't believe it will have access to the local filesystem as that access is limited.
Upvotes: 2