nicholaswmin
nicholaswmin

Reputation: 22949

Batch update in knex

I'd like to perform a batch update using Knex.js

For example:

'UPDATE foo SET [theValues] WHERE idFoo = 1'
'UPDATE foo SET [theValues] WHERE idFoo = 2'

with values:

{ name: "FooName1", checked: true } // to `idFoo = 1`
{ name: "FooName2", checked: false } // to `idFoo = 2`

I was using node-mysql previously, which allowed multiple-statements. While using that I simply built a mulitple-statement query string and just send that through the wire in a single run.

I'm not sure how to achieve the same with Knex. I can see batchInsert as an API method I can use, but nothing as far as batchUpdate is concerned.

Note:

So I'd like to do this using something "knex-y".

Any ideas welcome.

Upvotes: 27

Views: 53675

Answers (5)

jiminikiz
jiminikiz

Reputation: 2889

Assuming you have a collection of valid keys/values for the given table:

// abstract transactional batch update
function batchUpdate(table, collection) {
  return knex.transaction(trx => {
    const queries = collection.map(tuple =>
      knex(table)
        .where('id', tuple.id)
        .update(tuple)
        .transacting(trx)
    );
    return Promise.all(queries)
      .then(trx.commit)    
      .catch(trx.rollback);
  });
}

To call it

batchUpdate('user', [...]);

Are you unfortunately subject to non-conventional column names? No worries, I got you fam:

function batchUpdate(options, collection) {
  return knex.transaction(trx => {
    const queries = collection.map(tuple =>
      knex(options.table)
        .where(options.column, tuple[options.column])
        .update(tuple)
        .transacting(trx)
    );
    return Promise.all(queries)
      .then(trx.commit)    
      .catch(trx.rollback);
  });
}

To call it

batchUpdate({ table: 'user', column: 'user_id' }, [...]);

Modern Syntax Version:

const batchUpdate = async (options, collection) => {
  const { table, column } = options;
  const trx = await knex.transaction(); 
  try {
    await Promise.all(collection.map(tuple => 
      knex(table)
        .where(column, tuple[column])
        .update(tuple)
        .transacting(trx)
      )
    );
    await trx.commit();
  } catch (error) {
    await trx.rollback();
  }
}

Upvotes: 28

Shayan Shaikh
Shayan Shaikh

Reputation: 65

The update can be done in batches, i.e 1000 rows in a batch

And as long as it does it in batches, the bluebird map could be used.

For more information on bluebird map: http://bluebirdjs.com/docs/api/promise.map.html

const limit = 1000;
const totalRows = 50000;
const seq = count => Array(Math.ceil(count / limit)).keys();

map(seq(totalRows), page => updateTable(dbTable, page), { concurrency: 1 });

const updateTable = async (dbTable, page) => {
let offset = limit* page;

return knex(dbTable).pluck('id').limit(limit).offset(offset).then(ids => {
    return knex(dbTable)
        .whereIn('id', ids)
        .update({ date: new Date() })
        .then((rows) => {
            console.log(`${page} - Updated rows of the table ${dbTable} from ${offset} to ${offset + batch}: `, rows);
        })
        .catch((err) => {
            console.log({ err });
        });
})
.catch((err) => {
       console.log({ err });
 });
};

Where pluck() is used to get ids in array form

Upvotes: -1

Ryan Brockhoff
Ryan Brockhoff

Reputation: 706

The solution works great for me! I just include an ID parameter to make it dynamic across tables with custom ID tags. Chenhai, here's my snippet including a way to return a single array of ID values for the transaction:

function batchUpdate(table, id, collection) {
   return knex.transaction((trx) => {
       const queries = collection.map(async (tuple) => {
       const [tupleId] = await knex(table)
        .where(`${id}`, tuple[id])
        .update(tuple)
        .transacting(trx)
        .returning(id);

       return tupleId;
   });

   return Promise.all(queries).then(trx.commit).catch(trx.rollback);
   });
}

You can use response = await batchUpdate("table_name", "custom_table_id", [array of rows to update]) to get the returned array of IDs.

Upvotes: 1

A Bravo Dev
A Bravo Dev

Reputation: 532

I needed to perform a batch update inside a transaction (I didn't want to have partial updates in case something went wrong). I've resolved it the next way:

// I wrap knex as 'connection'
return connection.transaction(trx => {
    const queries = [];
    users.forEach(user => {
        const query = connection('users')
            .where('id', user.id)
            .update({
                lastActivity: user.lastActivity,
                points: user.points,
            })
            .transacting(trx); // This makes every update be in the same transaction
        queries.push(query);
    });

    Promise.all(queries) // Once every query is written
        .then(trx.commit) // We try to execute all of them
        .catch(trx.rollback); // And rollback in case any of them goes wrong
});

Upvotes: 36

Patrick Motard
Patrick Motard

Reputation: 2660

You have a good idea of the pros and cons of each approach. I would recommend a raw query that bulk updates over several async updates. Yes you can run them in parallel, but your bottleneck becomes the time it takes for the db to run each update. Details can be found here.

Below is an example of an batch upsert using knex.raw. Assume that records is an array of objects (one obj for each row we want to update) whose values are the properties names line up with the columns in the database you want to update:

var knex = require('knex'),
    _ = require('underscore');

function bulkUpdate (records) {
      var updateQuery = [
          'INSERT INTO mytable (primaryKeyCol, col2, colN) VALUES',
          _.map(records, () => '(?)').join(','),
          'ON DUPLICATE KEY UPDATE',
          'col2 = VALUES(col2),',
          'colN = VALUES(colN)'
      ].join(' '),

      vals = [];

      _(records).map(record => {
          vals.push(_(record).values());
      });

      return knex.raw(updateQuery, vals);
 }

This answer does a great job explaining the runtime relationship between the two approaches.

Edit:

It was requested that I show what records would look like in this example.

var records = [
  { primaryKeyCol: 123, col2: 'foo', colN: 'bar' },
  { // some other record, same props }
];

Please note that if your record has additional properties than the ones you specified in the query, you cannot do:

  _(records).map(record => {
      vals.push(_(record).values());
  });

Because you will hand too many values to the query per record and knex will fail to match the property values of each record with the ? characters in the query. You instead will need to explicitly push the values on each record that you want to insert into an array like so:

  // assume a record has additional property `type` that you dont want to
  // insert into the database
  // example: { primaryKeyCol: 123, col2: 'foo', colN: 'bar', type: 'baz' }
  _(records).map(record => {
      vals.push(record.primaryKeyCol);
      vals.push(record.col2);
      vals.push(record.colN);
  });

There are less repetitive ways of doing the above explicit references, but this is just an example. Hope this helps!

Upvotes: 20

Related Questions