ngalvin93
ngalvin93

Reputation: 61

Parse JSON data chunk from ReadableStream with Vanilla JS

I am fetching a large JSON file (200mb) and I want to render that data out as the data streams in. The problem I am having is that after I decode and parse the chunks that are streamed to me, a syntax error is returned to me Unexpected end of JSON input in the console. What I want to do is parse the return chunks and do something with that data as soon as I get it. However, because the ReadableStream is streamed in chunks that are unpredictable in how they are sliced, I cannot perform JSON.parse() on the return values. What sort of data massaging needs to be done to make this happen? Is there a better approach?

Here is my code:

const decoder = new TextDecoder('utf-8')
fetch("../files/response.json")
    .then(response => {
        const reader = response.body.getReader()
        new ReadableStream({
            start(controller) {
                function enqueueValues() {
                    reader.read()
                    .then(({ done, value }) => {
                        if (done) {
                            controller.close() // stream is complete
                            return
                        }
                        var decodedValue = decoder.decode(value) // one chunk of invalid json data in string format

                        console.log(JSON.parse(decodedValue)) // json syntax error

                        // do something with the json value here

                        controller.enqueue(value)

                        enqueueValues() // run until all data has been streamed
                    })
                }
                enqueueValues()
            }
        })
    })

Upvotes: 5

Views: 2866

Answers (2)

cdauth
cdauth

Reputation: 7558

What you need is a JSON parser that supports streaming. There are many libraries out there, one of which is json-stream-es, which I maintain.

From the description that you gave it sounds like the data arrives in JSONL format, meaning that the data is not a single JSON document but multiple ones (usually delimited by newlines).

You can parse such a stream with json-stream-es like this:

import { parseJsonStream, streamToIterable } from "json-stream-es";

const response = await fetch("../files/response.json");
const values = response.body
    .pipeThrough(new TextDecoderStream())
    .pipeThrough(parseJsonStream(undefined, { multi: true }));

for await (const decodedValue of streamToIterable(values)) {
    console.log(decodedValue);

    // do something with the json value here
}

Upvotes: 1

Vladimir Prudnikov
Vladimir Prudnikov

Reputation: 7242

I think the only way to achieve this is to send a valid json data (object, array) in each chunk.

Here is a sample express.js handler:

app.get("/stream", (req, res) => {
  let i = 0;
  const interval = setInterval((a) => {
    i += 1;
    res.write(JSON.stringify([{ message: `Chunk ${i}` }]));
  }, 500);

  setTimeout(() => {
    clearInterval(interval);
    res.end(() => {
      console.log("End");
    });
  }, 5000);
});

the downside of this is that final json (all chunks concatenated into one string) is not valid. But holding 200mb object in browser's memory is not good either.

UPDATE: I was trying to solve the similar issue in my project and found a workaround.

  • simply wrap all my chunks (objects) into an array
  • send opening and closing square brackets as a separate chunks
  • add comma to the end of each data chunk.

Then on the client I ignore chunks that are equal to [ and ] and cut ending comma in each data chunk.

Server:

app.get("/stream", (req, res) => {
  let i = 0,
    chunkString;
  res.write("["); // <<---- OPENING bracket
  const interval = setInterval((a) => {
    i += 1;
    chunkString = JSON.stringify({ message: `Chunk ${i}` });
    res.write(`${chunkString},`);   // <<----- Note ending comma at the end of each data chunk
  }, 500);

  setTimeout(() => {
    clearInterval(interval);
    res.end("]", () => {. // <<---- CLOSING bracket
      console.log("End");
    });
  }, 5000);
});

Client:

const decoder = new TextDecoder("utf-8");

const handleJsonChunk = (jsonChunk) => {
  console.log("Received Json Chunk: ", jsonChunk);
};

const main = async () => {
  const response = await fetch("http://localhost:3000/stream");
  const reader = response.body.getReader();
  const skipValues = ["[", "]"];

  const work = (reader) => {
    reader.read().then(({ done, value }) => {
      if (!done) {
        let stringValue = decoder.decode(value);
        const skip = skipValues.indexOf(stringValue) >= 0;
        if (skip) return work(reader);

        if (stringValue.endsWith(","))
          stringValue = stringValue.substr(0, stringValue.length - 1);

        try {
          const jsonValue = JSON.parse(stringValue);
          handleJsonChunk(jsonValue);
        } catch (error) {
          console.log(`Failed to parse chunk. Error: ${error}`);
        }

        work(reader);
      }
    });
  };
  work(reader);
};

main();

Upvotes: 1

Related Questions