Martin Blore
Martin Blore

Reputation: 2195

Mass update of data in table - best method?

I have a query similar to the below:

UPDATE customers c SET AssetsTotal = f_SumAssetCosts(c.Id);

I'm concerned that MySQL may be looking at the entire table and placing a lock for update on the entire data set. So if this single UPDATE statement happens for millions of customers and the f_SumAssetCosts has even just a small amount of time to process, this entire process could lock the table for whole seconds, potentially minutes. This isn't desirable in a high transaction system where the customers rows may be read/update quite frequently.

First, would MySQL be placing a lock (for UPDATE/DELETE) on the entire table whilst the entire operation is running?

Second, just to confirm my knowledge, is it still ok to SELECT from this table even during the UPDATE? I'm assuming the UPDATE statement hasn't yet "committed" its change so any SELECT's ran during will return the last known committed values? It's not going to lock the reads right and wait for the UPDATE to finish?

I'm thinking to break out the entire operation in a SELECT Id FROM customers in to a temporary table, then cursor loop over the table so that one UPDATE... call is ran and not doing any enormous locks (provided my assumptions about a huge lock above are true).

Upvotes: 0

Views: 849

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269963

If you are concerned about both the cost of the function and ongoing transactions and locking, then you might want to do this in batches. If the value is currently NULL and the new values are not, then this is easy:

UPDATE customers 
    SET AssetsTotal = f_SumAssetCosts(c.Id)
    WHERE AssetsTotal IS NOT NULL
    LIMIT 1000;  -- or whatever

This will update a fraction of the customers each time, limiting the locking and the period when the table is not available.

This gets tricky if the value is not starting off as NULL. In that case, you could set it to NULL . . . somewhat expensive but cheaper than the function call.

There are other methods depending on the specific requirements:

  • Add a column to specify if the column has been updated.
  • Regenerate the entire table, use truncate and re-insert/rename the regenerating table.
  • Do all the calculations in another table, use join to bring them in, and delete from the other table.

Considerations are:

  • Do all the updates have to appear at the same time?
  • What is the transactional load?
  • How expensive is the function?

Upvotes: 1

Related Questions