tmuecksch
tmuecksch

Reputation: 6642

Chrome FileReader returns empty string for big files (>= 300MB)

Goal:

Issue:

Questions:

Important:

Please note, that chunking is not an option for me, since I need to send the full base64 string via 'POST' to an API that does not support chunks.

Code:

'use strict';

var filePickerElement = document.getElementById('filepicker');

filePickerElement.onchange = (event) => {
  const selectedFile = event.target.files[0];
  console.log('selectedFile', selectedFile);

  readFile(selectedFile);
};

function readFile(selectedFile) {
  console.log('START READING FILE');
  const reader = new FileReader();

  reader.onload = (e) => {
    const fileBase64 = reader.result.toString();

    console.log('ONLOAD','base64', fileBase64);
    
    if (fileBase64 === '') {
      alert('Result string is EMPTY :(');
    } else {
        alert('It worked as expected :)');
    }
  };

  reader.onprogress = (e) => {
    console.log('Progress', ~~((e.loaded / e.total) * 100 ), '%');
  };

  reader.onerror = (err) => {
    console.error('Error reading the file.', err);
  };

  reader.readAsDataURL(selectedFile);
}
<!doctype html>
<html lang="en">

<head>
  <!-- Required meta tags -->
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <!-- Bootstrap CSS -->
  <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet"
    integrity="sha384-wEmeIV1mKuiNpC+IOBjI7aAzPcEZeedi5yW5f2yOq55WWLwNGmvvx4Um1vskeMj0" crossorigin="anonymous">

  <title>FileReader issue example</title>
</head>

<body>

  <div class="container">
    <h1>FileReader issue example</h1>
    <div class="card">
      <div class="card-header">
        Select File:
      </div>
      <div class="card-body">
        <input type="file" id="filepicker" />
      </div>
    </div>

  </div>

  <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"
    integrity="sha384-p34f1UUtsS3wqzfto5wAAmdvj+osOnFyQFpp4Ua3gs/ZVWx6oOypYoCJhGGScy+8"
    crossorigin="anonymous"></script>
  <script src="main.js"></script>
</body>

</html>

Upvotes: 7

Views: 2186

Answers (2)

Kaiido
Kaiido

Reputation: 136746

Is this a chrome bug?

As I said in my answer to Chrome, FileReader API, event.target.result === "", this a V8 (Chrome's but also node-js's and others' JavaScript JS engine) limitation.
It is intentional and thus can't really qualify as "a bug".
The technicalities are that what actually fails here is to build a String of more than 512MB (less the header) on 64bit systems because in V8 all heap objects must fit in a Smi (Small Integer), (cf this commit).

Why is there neither an error nor an exception?

That, might be a bug... As I also show in my linked answer, we get a RangeError when creating such a string directly:

const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) - header );
let txt = new TextDecoder().decode( bytes );
console.log( txt.length ); // 536870888
txt += "f"; // RangeError

And in the step 3 of FileReader::readOperation, UAs have to

If package data threw an exception error:

  • Set fr’s error to error.
  • Fire a progress event called error at fr.

But here, we don't have that error.

const bytes = Uint32Array.from( { length: 600 * 1024 * 1024 / 4 }, (_) => Math.random() * 0xFFFFFFFF );
const blob = new Blob( [ bytes ] );
const fr = new FileReader();
fr.onerror = console.error;
fr.onload = (evt) => console.log( "success", fr.result.length, fr.error );
fr.readAsDataURL( blob );

I will open an issue about this, since you should be able to handle that error from the FileReader.

How can I fix or work around this issue?

The best is definitely to make your API end-point accepts binary resources directly instead of data:// URLs, which should always be avoided anyway.

If this is not doable, a solution "for the future", will be to POST a ReadableStream to your end-point, and do the data:// URL conversion yourself, on a stream from the Blob.

class base64StreamEncoder {
  constructor( header ) {
    if( header ) {
      this.header = new TextEncoder().encode( header );
    }
    this.tail = [];
  }
  transform( chunk, controller ) {
    const encoded = this.encode( chunk );
    if( this.header ) {
      controller.enqueue( this.header );
      this.header = null;
    }
    controller.enqueue( encoded );
  }
  encode( bytes ) {
    let binary = Array.from( this.tail )
        .reduce( (bin, byte) => bin + String.fromCharCode( byte ), "" );
    const tail_length = bytes.length % 3;
    const last_index = bytes.length - tail_length;
    this.tail = bytes.subarray( last_index );
    for( let i = 0; i<last_index; i++ ) {
        binary += String.fromCharCode( bytes[ i ] );
    }
    const b64String = window.btoa( binary );
    return new TextEncoder().encode( b64String );
  }
  flush( controller ) {
    // force the encoding of the tail
    controller.enqueue( this.encode( new Uint8Array() ) );
  }
}

Live example: https://base64streamencoder.glitch.me/

For now, you'd have to store chunks of the base64 representation in a Blob as demonstrated by Endless's answer.

However beware that since this is a V8 limitation, even the server-side can face issues with strings this big, so anyway, you should contact your API's maintainer.

Upvotes: 4

Endless
Endless

Reputation: 37825

Here is a partial solution that transform a blob in chunks into base64 blobs... concatenates everything into one json blob with a pre/suffix part of the json and the base64 chunks inbetween

Keeping it as a blob allows browser to optimize the memory allocation and offload it to the disk if needed.

you could try to change the chunkSize to something larger, browser likes to keep smaller blob chunks in memory (one bucket)

// get some dummy gradient file (blob)
var a=document.createElement("canvas"),b=a.getContext("2d"),c=b.createLinearGradient(0,0,3000,3000);a.width=a.height=3000;c.addColorStop(0,"red");c.addColorStop(1,"blue");b.fillStyle=c;b.fillRect(0,0,a.width,a.height);a.toBlob(main);

async function main (blob) {
  var fr = new FileReader()
  // Best to add 2 so it strips == from all chunks
  // except from the last chunk
  var chunkSize = (1 << 16) + 2 
  var pos = 0
  var b64chunks = []
  
  while (pos < blob.size) {
    await new Promise(rs => {
      fr.readAsDataURL(blob.slice(pos, pos + chunkSize))
      fr.onload = () => {
        const b64 = fr.result.split(',')[1]
        // Keeping it as a blob allaws browser to offload memory to disk
        b64chunks.push(new Blob([b64]))
        rs()
      }
      pos += chunkSize
    })
  }

  // How you concatinate all chunks to json is now up to you.
  // this solution/answer is more of a guideline of what you need to do
  // There are some ways to do it more automatically but here is the most
  // simpliest form
  // (fyi: this new blob won't create so much data in memory, it will only keep references points to other blobs locations)
  const jsonBlob = new Blob([
    '{"data": "', ...b64chunks, '"}'
  ], { type: 'application/json' })

  /*
  // strongly advice you to tell the api developers 
  // to add support for binary/file upload (multipart-formdata)
  // base64 is roughly ~33% larger and streaming
  // this data on the server to the disk is almost impossible 
  fetch('./upload-files-to-bad-json-only-api', {
    method: 'POST',
    body: jsonBlob
  })
  */
  
  // Just a test that it still works
  //
  // new Response(jsonBlob).json().then(console.log)
  fetch('data:image/png;base64,' + await new Blob(b64chunks).text()).then(r => r.blob()).then(b => console.log(URL.createObjectURL(b)))
}

I have avoided to make base64 += fr.result.split(',')[1] and JSON.stringify since GiB of data is a lot and json shouldn't handle binary data anyway

Upvotes: 0

Related Questions