user3384166
user3384166

Reputation: 1

Update query for 110 million records won't finish

We are using MySQL. We have a very long table (110m * 7) and we regularly add in new records to this table (when doing that, we upload csv files so never a really painful process even large # of records).

Now we need to add a new column to this table to distinguish the records in some way. To be specific, the most recent added about 10 million records will be flagged as type 2 and all the old records will be flagged as type 1. And in the future, we will be putting in new records for both types.

At first, we tried the approach 1 as below, but it ran for more than 24 hours without complaining anything and the host server has always been responsive.

-- ====================================================
-- start of approach 1
-- ====================================================

-- Add column.
ALTER TABLE 
  bond_price 
ADD 
  bp_name_version SMALLINT 
;


-- Set value 1 for BOND_CHARACTER types.
UPDATE 
  bond_price 
SET 
  bp_name_version = 1 
WHERE 
  bp_serial_id < 107480325 
;


-- Set value 2 for BOND_CHARACTER_EIKON types.
UPDATE 
  bond_price 
SET 
  bp_name_version = 2 
WHERE 
  bp_serial_id >= 107480325 
;


-- Set a NOT NULL constrain on the new column
ALTER TABLE 
  bond_price 
ALTER 
  bp_name_version SET NOT NULL 
;


-- ====================================================
-- END of approach 1
-- ====================================================

We lost our faith after 24 hours, thinking setting a conditional clause might has made it difficult for such a long table. So we tried approach 2 which is doing approach 1 step by step and without conditional clause.

So we firstly execute the following query and it finished within seconds.

-- Add column.
ALTER TABLE 
  bond_price 
ADD 
  bp_name_version SMALLINT 
;

We then execute the following query to put value 1 to all records, hoping later on we can change the value for only 10 million records to 2.

-- Set value 1 for all records.
UPDATE 
  bond_price 
SET 
  bp_name_version = 1;

But this query has run for over 24 hours till now, again without complaining for anything.

We have been monitoring the server by:

select * from pg_stat_activity;

And the 'Set value 1' query is still active and the server is still very responsive.

OUR QUESTIONS:

  1. Is this speed what one should expect, considering the record number is over 100 million?
  2. Is there a possibility that this query will not work but just get stuck forever? Is there anyway to tell?
  3. Is there anyway to improve the speed? Or do it a different way?

Many thanks in advance!

Upvotes: 0

Views: 293

Answers (1)

DRapp
DRapp

Reputation: 48139

I don't know the basis of the data, but 110+ million records in one push is probably a bit nuts.

Why not trying to find some other criteria, do a loop, and do them in smaller chunks... maybe based on some "Add" Date, or some other field in the structure. Or, just use the bp_serial_id, such as (again, handled by a loop)

pseudo-code to update 1 million max at a time

maxSerialForType1 = 107480325 

for cycle = 0 to 110
   startSerialID = cycle * 1000000
   endSerialID = (cycle +1 ) * 1000000

   if startSerialID < maxSerialForType1
      UPDATE bond_price 
         SET bp_name_version = 1
         WHERE bp_serial_id < maxSerialForType1
           AND bp_serial_id >= startSerialID
           AND bp_serial_id < endSerialID
   end for update type 1

   if startSerialID > maxSerialForType1
      UPDATE bond_price 
         SET bp_name_version = 2
         WHERE bp_serial_id > maxSerialForType1
           AND bp_serial_id >= startSerialID
           AND bp_serial_id < endSerialID
   end for update type 2
end of loop

So this, if made like a stored procedure will update the 110 million for you.

I would then make a suggestion, that when importing new records into a temp table with the bp_name_version value assigned there, THEN pull them into the final table so you don't have to go through trying to update 110+ million each time.

Upvotes: 1

Related Questions