Reputation: 4103

How to calculate md5 hash of a file using javascript

Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?

Upvotes: 130

Answers (15)

131

Reputation: 3361

You can calculate md5 on the browser using hash-wasm. You need to read the file by chunk, allowing your browser to support big files. The following implementation produce accurate md5 of a 2Gb file in < 6s (i think this might be the fastest & simple possible design).

async function computeFileMD5(file, notifyprogress = () => {}) {
  const CHUNK_SIZE = 10 * 1024 * 1024;
  const hasher = await hashwasm.createMD5();
  let offset = 0;
  const totalSize = file.size;

  const reader = new FileReader();
  while (offset < totalSize) {
    const slice = file.slice(offset, offset + CHUNK_SIZE);

    const arrayBuffer = await new Promise((resolve, reject) => {
      reader.onload = e => resolve(e.target.result);
      reader.onerror = e => reject(e.target.error);
      reader.readAsArrayBuffer(slice);
    });

    hasher.update(new Uint8Array(arrayBuffer));

    offset += CHUNK_SIZE;

    notifyprogress(offset / totalSize);
  }

  return hasher.digest();
}



function updateProgressBar(ratio) {
  const percentage = Math.min(ratio* 100, 100);
  const progressBarFill = document.getElementById('progressBarFill');
  const progressText = document.getElementById('progressText');
  progressBarFill.style.width = percentage + '%';
  progressText.textContent = `Progress : ${percentage.toFixed(2)}%`;
}

async function onFileSelected(event) {
  const file = event.target.files[0];
  if (!file) return;

  updateProgressBar(0);
  document.getElementById('md5Result').textContent = 'Computing...';

  const start = Date.now();
  const md5Hash = await computeFileMD5(file, updateProgressBar);
  const end = Date.now();
  const duration = end - start;
  const fileSizeMB = file.size / 1024 / 1024;
  const throughput = fileSizeMB / (duration / 1000);

  document.getElementById('md5Result').innerHTML = `
    Hash: ${md5Hash}<br>
    Duration: ${duration} ms<br>
    Throughput: ${throughput.toFixed(2)} MB/s
  `;
}

document.addEventListener('DOMContentLoaded', () => {
  const fileInput = document.getElementById('fileInput');
  fileInput.addEventListener('change', onFileSelected, false);
});

    #progressBar {
      width: 300px;
      height: 20px;
      border: 1px solid #aaa;
      margin-top: 10px;
      position: relative;
    }
    #progressBarFill {
      background: #4caf50;
      width: 0;
      height: 100%;
      transition: width 0.2s;
    }

<script src="https://cdn.jsdelivr.net/npm/hash-wasm@4/dist/md5.umd.min.js"></script>

<h1>MD5 client side compute</h1>

<input type="file" id="fileInput" />

<div id="progressBar">
  <div id="progressBarFill"></div>
</div>
<p id="progressText">Progression : 0%</p>

<pre id="md5Result"></pre>

Upvotes: 1

Timothy C. Quinn

Reputation: 4485

Here is my refactored version of @Brio's hashwasm answer:


async function file_sha1(file){ return await _hashwasm_from_file(file, 'SHA1') }
async function file_md5(file){ return await _hashwasm_from_file(file, 'MD5') }

const FILE_CHUNK_SIZE = 64 * 1024 * 1024;
async function _hash_chunk(fileReader, hasher, chunk) {
  return new Promise((resolve, reject) => {
    fileReader.onload = async(e) => {
      const view = new Uint8Array(e.target.result);
      hasher.update(view);
      resolve();
    };

    fileReader.readAsArrayBuffer(chunk);
  });
}

async function _hashwasm_from_file(file, algo){
  const fileReader = new FileReader();
  let hashwasm_creator = hashwasm[`create${algo}`];
  if (!hashwasm_creator) throw new Error(`hashwasm api create${algo}() not found`);
  let hasher = await hashwasm_creator();

  const chunkNumber = Math.floor(file.size / FILE_CHUNK_SIZE);

  for (let i = 0; i <= chunkNumber; i++) {
    const chunk = file.slice(
      FILE_CHUNK_SIZE * i,
      Math.min(FILE_CHUNK_SIZE * (i + 1), file.size)
    );
    await _hash_chunk(fileReader, hasher, chunk);
  }

  const hash = hasher.digest();
  return Promise.resolve(hash);
};

Usage: Simply call await file_sha1(<file>) to get the sha1 hash for <file>.

Upvotes: 0

Danielle Madeley

Reputation: 2806

This is another hash-wasm example, but using the streams API, instead of having to set FileReader:

async function calculateSHA1(file: File) {
  const hasher = await createSHA1()

  const hasherStream = new WritableStream<Uint8Array>({
    start: () => {
      hasher.init()
      // you can set UI state here also
    },
    write: chunk => {
      hasher.update(chunk)
      // you can set UI state here also
    },
    close: () => {
      // you can set UI state here also
    },
  })

  await file.stream().pipeTo(hasherStream)

  return hasher.digest('hex')
}

Upvotes: 0

wutzebaer

Reputation: 14863

If sha256 is also fine:

  async sha256(file: File) {
    // get byte array of file
    let buffer = await file.arrayBuffer();

    // hash the message
    const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);

    // convert ArrayBuffer to Array
    const hashArray = Array.from(new Uint8Array(hashBuffer));

    // convert bytes to hex string
    const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
    return hashHex;
  }

Upvotes: 10

Paul Dixon

Reputation: 300845

While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.

I wrote that in 2009. So what about new browsers?

With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input> element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:

FF 3.6 supports FileReader, FF4 supports even more file based functionality
Chrome has supported the FileAPI since version 7.0.517.41
Internet Explorer 10 has partial FileAPI support
Opera 11.10 has partial support for FileAPI
Safari - I couldn't find a good official source for this, but this site suggests partial support from 5.1, full support for 6.0. Another article reports some inconsistencies with the older Safari versions

How?

See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS

Upvotes: 107

Biró Dani

Reputation: 534

The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.

It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.

const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;

function hashChunk(chunk) {
  return new Promise((resolve, reject) => {
    fileReader.onload = async(e) => {
      const view = new Uint8Array(e.target.result);
      hasher.update(view);
      resolve();
    };

    fileReader.readAsArrayBuffer(chunk);
  });
}

const readFile = async(file) => {
  if (hasher) {
    hasher.init();
  } else {
    hasher = await hashwasm.createMD5();
  }

  const chunkNumber = Math.floor(file.size / chunkSize);

  for (let i = 0; i <= chunkNumber; i++) {
    const chunk = file.slice(
      chunkSize * i,
      Math.min(chunkSize * (i + 1), file.size)
    );
    await hashChunk(chunk);
  }

  const hash = hasher.digest();
  return Promise.resolve(hash);
};

const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");

fileSelector.addEventListener("change", async(event) => {
  const file = event.target.files[0];

  resultElement.innerHTML = "Loading...";
  const start = Date.now();
  const hash = await readFile(file);
  const end = Date.now();
  const duration = end - start;
  const fileSizeMB = file.size / 1024 / 1024;
  const throughput = fileSizeMB / (duration / 1000);
  resultElement.innerHTML = `
    Hash: ${hash}<br>
    Duration: ${duration} ms<br>
    Throughput: ${throughput.toFixed(2)} MB/s
  `;
});

<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->

<input type="file" id="file-input">
<div id="result"></div>

Upvotes: 30

Zico Deng

Reputation: 735

hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5

import SparkMD5 from 'spark-md5';

// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;

/**
 * Incrementally calculate checksum of a given file based on MD5 algorithm
 */
export const checksum = (file) =>
  new Promise((resolve, reject) => {
    let currentChunk = 0;
    const chunks = Math.ceil(file.size / CHUCK_SIZE);
    const blobSlice =
      File.prototype.slice ||
      File.prototype.mozSlice ||
      File.prototype.webkitSlice;
    const spark = new SparkMD5.ArrayBuffer();
    const fileReader = new FileReader();

    const loadNext = () => {
      const start = currentChunk * CHUCK_SIZE;
      const end =
        start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;

      // Selectively read the file and only store part of it in memory.
      // This allows client-side applications to process huge files without the need for huge memory
      fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
    };

    fileReader.onload = e => {
      spark.append(e.target.result);
      currentChunk++;

      if (currentChunk < chunks) loadNext();
      else resolve(spark.end());
    };

    fileReader.onerror = () => {
      return reject('Calculating file checksum failed');
    };

    loadNext();
  });

Upvotes: 3

Benny Code

Reputation: 54812

it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:

var holder = document.getElementById('holder');

holder.ondragover = function() {
  return false;
};

holder.ondragend = function() {
  return false;
};

holder.ondrop = function(event) {
  event.preventDefault();

  var file = event.dataTransfer.files[0];
  var reader = new FileReader();

  reader.onload = function(event) {
    var binary = event.target.result;
    var md5 = CryptoJS.MD5(binary).toString();
    console.log(md5);
  };

  reader.readAsBinaryString(file);
};

I recommend to add some CSS to see the Drag & Drop area:

#holder {
  border: 10px dashed #ccc;
  width: 300px;
  height: 300px;
}

#holder.hover {
  border: 10px dashed #333;
}

More about the Drag & Drop functionality can be found here: File API & FileReader

I tested the sample in Google Chrome Version 32.

Upvotes: 36

Jossef Harush Kadouri

Reputation: 34207

HTML5 + `spark-md5` and `Q`

Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)

function calculateMD5Hash(file, bufferSize) {
  var def = Q.defer();

  var fileReader = new FileReader();
  var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
  var hashAlgorithm = new SparkMD5();
  var totalParts = Math.ceil(file.size / bufferSize);
  var currentPart = 0;
  var startTime = new Date().getTime();

  fileReader.onload = function(e) {
    currentPart += 1;

    def.notify({
      currentPart: currentPart,
      totalParts: totalParts
    });

    var buffer = e.target.result;
    hashAlgorithm.appendBinary(buffer);

    if (currentPart < totalParts) {
      processNextPart();
      return;
    }

    def.resolve({
      hashResult: hashAlgorithm.end(),
      duration: new Date().getTime() - startTime
    });
  };

  fileReader.onerror = function(e) {
    def.reject(e);
  };

  function processNextPart() {
    var start = currentPart * bufferSize;
    var end = Math.min(start + bufferSize, file.size);
    fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
  }

  processNextPart();
  return def.promise;
}

function calculate() {

  var input = document.getElementById('file');
  if (!input.files.length) {
    return;
  }

  var file = input.files[0];
  var bufferSize = Math.pow(1024, 2) * 10; // 10MB

  calculateMD5Hash(file, bufferSize).then(
    function(result) {
      // Success
      console.log(result);
    },
    function(err) {
      // There was an error,
    },
    function(progress) {
      // We get notified of the progress as it is executed
      console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
    });
}

<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>

<div>
  <input type="file" id="file"/>
  <input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>

Upvotes: 15

Marco Antonio

Reputation: 339

To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.

I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.

Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme

Upvotes: 3

satazor

Reputation: 619

I've made a library that implements incremental md5 in order to hash large files efficiently. Basically you read a file in chunks (to keep memory low) and hash it incrementally. You got basic usage and examples in the readme.

Be aware that you need HTML5 FileAPI, so be sure to check for it. There is a full example in the test folder.

https://github.com/satazor/SparkMD5

Upvotes: 34

Aleksandar Totic

Reputation: 2597

You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9. Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.

Code would look something like this:

HTML:     
<input type="file" id="file-dialog" multiple="true" accept="image/*">

JS (w JQuery)

$("#file-dialog").change(function() {
  handleFiles(this.files);
});

function handleFiles(files) {
    for (var i=0; i<files.length; i++) {
        var reader = new FileReader();
        reader.onload = function() {
        var md5 = binl_md5(reader.result, reader.result.length);
            console.log("MD5 is " + md5);
        };
        reader.onerror = function() {
            console.error("Could not read the file");
        };
        reader.readAsBinaryString(files.item(i));
     }
 }

Upvotes: 8

Marco

Reputation: 71

Apart from the impossibility to get file system access in JS, I would not put any trust at all in a client-generated checksum. So generating the checksum on the server is mandatory in any case. – Tomalak Apr 20 '09 at 14:05

Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.

Upvotes: 7

kbosak

Reputation: 2130

I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.

You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.

Upvotes: -5

bendewey

Reputation: 40235

There is a couple scripts out there on the internet to create an MD5 Hash.

The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html

Although, I don't believe it will have access to the local filesystem as that access is limited.

Upvotes: 2

How to calculate md5 hash of a file using javascript

Answers (15)

HTML5 + spark-md5 and Q

Related Questions

HTML5 + `spark-md5` and `Q`