Reputation: 22949
I'd like to perform a batch update using Knex.js
For example:
'UPDATE foo SET [theValues] WHERE idFoo = 1'
'UPDATE foo SET [theValues] WHERE idFoo = 2'
with values:
{ name: "FooName1", checked: true } // to `idFoo = 1`
{ name: "FooName2", checked: false } // to `idFoo = 2`
I was using node-mysql previously, which allowed multiple-statements. While using that I simply built a mulitple-statement query string and just send that through the wire in a single run.
I'm not sure how to achieve the same with Knex. I can see batchInsert
as an API method I can use, but nothing as far as batchUpdate
is concerned.
I can do an async iteration and update each row separately. That's bad cause it means there's gonna be lots of roundtrips from the server to the DB
I can use the raw()
thing of Knex and probably do something similar to what I do with node-mysql. However that defeats the whole knex purpose of being a DB abstraction layer (It introduces strong DB coupling)
So I'd like to do this using something "knex-y".
Any ideas welcome.
Upvotes: 27
Views: 53675
Reputation: 2889
Assuming you have a collection of valid keys/values for the given table:
// abstract transactional batch update
function batchUpdate(table, collection) {
return knex.transaction(trx => {
const queries = collection.map(tuple =>
knex(table)
.where('id', tuple.id)
.update(tuple)
.transacting(trx)
);
return Promise.all(queries)
.then(trx.commit)
.catch(trx.rollback);
});
}
To call it
batchUpdate('user', [...]);
Are you unfortunately subject to non-conventional column names? No worries, I got you fam:
function batchUpdate(options, collection) {
return knex.transaction(trx => {
const queries = collection.map(tuple =>
knex(options.table)
.where(options.column, tuple[options.column])
.update(tuple)
.transacting(trx)
);
return Promise.all(queries)
.then(trx.commit)
.catch(trx.rollback);
});
}
To call it
batchUpdate({ table: 'user', column: 'user_id' }, [...]);
Modern Syntax Version:
const batchUpdate = async (options, collection) => {
const { table, column } = options;
const trx = await knex.transaction();
try {
await Promise.all(collection.map(tuple =>
knex(table)
.where(column, tuple[column])
.update(tuple)
.transacting(trx)
)
);
await trx.commit();
} catch (error) {
await trx.rollback();
}
}
Upvotes: 28
Reputation: 65
The update can be done in batches, i.e 1000 rows in a batch
And as long as it does it in batches, the bluebird map could be used.
For more information on bluebird map: http://bluebirdjs.com/docs/api/promise.map.html
const limit = 1000;
const totalRows = 50000;
const seq = count => Array(Math.ceil(count / limit)).keys();
map(seq(totalRows), page => updateTable(dbTable, page), { concurrency: 1 });
const updateTable = async (dbTable, page) => {
let offset = limit* page;
return knex(dbTable).pluck('id').limit(limit).offset(offset).then(ids => {
return knex(dbTable)
.whereIn('id', ids)
.update({ date: new Date() })
.then((rows) => {
console.log(`${page} - Updated rows of the table ${dbTable} from ${offset} to ${offset + batch}: `, rows);
})
.catch((err) => {
console.log({ err });
});
})
.catch((err) => {
console.log({ err });
});
};
Where pluck() is used to get ids in array form
Upvotes: -1
Reputation: 706
The solution works great for me! I just include an ID parameter to make it dynamic across tables with custom ID tags. Chenhai, here's my snippet including a way to return a single array of ID values for the transaction:
function batchUpdate(table, id, collection) {
return knex.transaction((trx) => {
const queries = collection.map(async (tuple) => {
const [tupleId] = await knex(table)
.where(`${id}`, tuple[id])
.update(tuple)
.transacting(trx)
.returning(id);
return tupleId;
});
return Promise.all(queries).then(trx.commit).catch(trx.rollback);
});
}
You can use
response = await batchUpdate("table_name", "custom_table_id", [array of rows to update])
to get the returned array of IDs.
Upvotes: 1
Reputation: 532
I needed to perform a batch update inside a transaction (I didn't want to have partial updates in case something went wrong). I've resolved it the next way:
// I wrap knex as 'connection'
return connection.transaction(trx => {
const queries = [];
users.forEach(user => {
const query = connection('users')
.where('id', user.id)
.update({
lastActivity: user.lastActivity,
points: user.points,
})
.transacting(trx); // This makes every update be in the same transaction
queries.push(query);
});
Promise.all(queries) // Once every query is written
.then(trx.commit) // We try to execute all of them
.catch(trx.rollback); // And rollback in case any of them goes wrong
});
Upvotes: 36
Reputation: 2660
You have a good idea of the pros and cons of each approach. I would recommend a raw query that bulk updates over several async updates. Yes you can run them in parallel, but your bottleneck becomes the time it takes for the db to run each update. Details can be found here.
Below is an example of an batch upsert using knex.raw. Assume that records is an array of objects (one obj for each row we want to update) whose values are the properties names line up with the columns in the database you want to update:
var knex = require('knex'),
_ = require('underscore');
function bulkUpdate (records) {
var updateQuery = [
'INSERT INTO mytable (primaryKeyCol, col2, colN) VALUES',
_.map(records, () => '(?)').join(','),
'ON DUPLICATE KEY UPDATE',
'col2 = VALUES(col2),',
'colN = VALUES(colN)'
].join(' '),
vals = [];
_(records).map(record => {
vals.push(_(record).values());
});
return knex.raw(updateQuery, vals);
}
This answer does a great job explaining the runtime relationship between the two approaches.
Edit:
It was requested that I show what records
would look like in this example.
var records = [
{ primaryKeyCol: 123, col2: 'foo', colN: 'bar' },
{ // some other record, same props }
];
Please note that if your record
has additional properties than the ones you specified in the query, you cannot do:
_(records).map(record => {
vals.push(_(record).values());
});
Because you will hand too many values to the query per record and knex will fail to match the property values of each record with the ?
characters in the query. You instead will need to explicitly push the values on each record that you want to insert into an array like so:
// assume a record has additional property `type` that you dont want to
// insert into the database
// example: { primaryKeyCol: 123, col2: 'foo', colN: 'bar', type: 'baz' }
_(records).map(record => {
vals.push(record.primaryKeyCol);
vals.push(record.col2);
vals.push(record.colN);
});
There are less repetitive ways of doing the above explicit references, but this is just an example. Hope this helps!
Upvotes: 20