Horace
Horace

Reputation: 1

Updating many rows

I have a DataTable with around 500K+ records, and I would like to update a field in the database for each of those records contained in the DataTable.

At the moment, I am getting a list of all the records in the ID column of the DataTable and creating an update statement with an SQL in statement on those IDs, like so:

string sql = "UPDATE my_table SET my_field = @timestamp WHERE id IN (" + String.Join(", ", myDataTable.AsEnumerable().Select(r => r.Field<Int64>("id")).ToList().ToArray()) + ")";

This works for a few records, however due to the large amount of IDs, I am now getting an error saying that the packet size in MySQL is not large enough for this query.

What can be done, besides increasing the packet size, which seems to be a bad approach?

Upvotes: 0

Views: 162

Answers (3)

Aleksei
Aleksei

Reputation: 591

For similar task I used the following approach once:

  1. Create additional table with columns session_id GUID, record_id bigint.
  2. Before starting update operation generate unique identifier for this operation (session_id).
  3. Insert all ids you have to update into this table along with generated session id.
  4. UPDATE primary table using INNER JOIN to this table, specifying particular session id.
  5. DELETE all records from session table by session id.

When it's done you may start working on performance:

  • turning logging for session table off, since it doesn't contain any essential data;
  • experiment with creating temp table each time you need it instead of using static one (in theory it would simplify cleaning up as DROP TABLE has to work much faster then DELETE FROM, but in my case creating table worked too slow so I preferred using permanent table);
  • use bigint instead of GUID-s for identifying sessions as they have to be compared faster;
  • use COPY FROM (PostgreSQL) or BULK INSERT (MS SQL) statement instead of series of INSERT INTO-s. I didn't find anything similar for MySQL though.

I don't know what will suit you best )

Upvotes: 1

Sloan Thrasher
Sloan Thrasher

Reputation: 5040

Assuming id_list_table contains a list of IDs to update in my_table:

UPDATE my_table
SET my_field = @timestamp
JOIN id_list_table
ON my_table.id = id_list_table.my_table_id

Upvotes: 0

Abdullah Dibas
Abdullah Dibas

Reputation: 1507

I don't know if it is possible to increase the packet size in MySQL, but there is a simple solution to your problem by dividing your UPDATE query to multiple UPDATE queries for multiple groups of IDs

Upvotes: 0

Related Questions