Reputation: 35

Nodejs best way to handle postgres query that returns hundreds of thousands of rows

I'm working on a data dashboard with maps. I've created a table that contains positions of cars and I have a simple query.

SELECT * FROM table where timeslice = ${timeslice}

Originally this was fast and sent the necessary data to the front end adequately, but the other team members have started creating tables that contain millions of records and now the response time to the front end is way too slow. At specific time periods this query is now returning upwards of 600k records(it used to max out around 10k before the team members started adding way more data)

I've added an index to the table which dramatically improved query time from 10-15 seconds to 2 seconds. However, it can still take upwards of 1 minute for front end to receive the response object. I cant find anything about improving that speed.

Thanks for any help.

Upvotes: 2

Answers (2)

ForbesLindesay

Reputation: 10702

gzip

The first thing I would try is enabling gzip. If you're not already doing so, this will likely be up to a 10x improvement in speed, depending on how your data is structured/how much repetition there is. You can use koa-compress for koa or compression for express.

Group/Fitler

The next thing I would try is to see if there's any way you can do more processing to group/merge/filter data before you send it to the client. e.g. do you need the position of every single car, or could you get away with the number of cars in each grid square on a map. Then when the user zooms in you could fetch more detailed data for the area they zoom to.

Cache

If none of that is an option, you're going to be looking at caching & streaming.

If you have a lot of repeat visits from the same users, and the data changes infrequently, you could use something like localforage to store the data on the client side, and then when making the request to the server, you could ask for just the records that have changed since a given date.

Stream

If you really must, you can stream data to the client. It'll take more or less the same amount of time to get all the data, but they'll see some of the data appear as they go. @databases/pg has a db.stream method that returns a stream of rows, you could pipe this through something like newline-json to serialize it and then pipe that into the response. on the client you could use the new fetch APIs on browsers that support them to retrieve the data as a stream, and then you could parse each row as it comes in.

Upvotes: 1

hellerahum

Reputation: 103

pg-cursor is a tool that might help here. The official node-postgres docs link to this solution and include code samples.

This will let you fetch your query in batches (of a size you determine) and act on each of those batches independently. Previous batches will be garbage collected appropriately (unless you're deliberately keeping references to those objects).

I've included a simplified version of my implementation below:


// how many rows to retrieve at a time
const BATCH_SIZE = 100;

const cursor = client.query(new Cursor('MY QUERY'));

// wrap the whole retrieval in a promise
function processResults() {
  return new Promise((resolve, reject) => {
    (function read() {
      cursor.read(BATCH_SIZE, async (err, rows) => {
        if (err) {
          return reject(err);
        }

        // no more rows, so we're done!
        if (!rows.length) {
          return resolve();
        }

        // do something with those rows here...

        // get the next batch
        return read();
      });
    })();
  });
}

const isComplete = await processResults();

Upvotes: 4