Ali
Ali

Reputation: 7483

Need to update a MySQL table with Millions of rows

I have two table shere:

Cities
Region| City Name 

States
ID| State | Region_Key

I need to do an update query on the cities table like so i.e set cities.region = statres.id where states.region_key = cities.region

The problem is that the cities database has over 2.7 million records and I tried doing a query like this only for mysql to hang and die out.

update cities c, states c set c.region = s.id where c.region = s.region_key

EDIT ===================

This is the sql I am using but its not working I get an error saying incorrect usage of UPDATE and LIMIT

update cities w, states s 
set w.region_id = s.id, 
w.updated = 1 
where w.region = s.w_code and w.updated = 0
LIMIT 10000

Upvotes: 2

Views: 4667

Answers (2)

sanmai
sanmai

Reputation: 30881

Use SELECT INTO NEW_TABLE to create a new table with desired content, and then drop/rename older table and use RENAME TABLE to rename newely created table into proper name:

CREATE TABLE new_cities SELECT 
   states.id AS region_id, cities.name 
FROM cities JOIN states ON cities.region = states.w_code;

RENAME TABLE cities TO old_cities, new_cities TO cities;

Upvotes: 1

Yaakov Ellis
Yaakov Ellis

Reputation: 41490

  1. Add a nullable bit column [HasBeenUpdated] to the cities table
  2. Add Set c.HasBeenUpdated = 1 to the update clause
  3. Add the following where condition AND c.HasBeenUpdated IS NULL
  4. Add a new WHERE condition AND c.ID in (Select ID from Cities where HasBeenUpdated Is Null Limit 10000). This is needed because you cannot use a Limit statement on a multi-table Update (source). This also presumes that you have an ID column as the PK for cities (if not, then consider adding one). Now the update statement will only process 10,000 rows at a time (and will only process unprocessed rows).

If you can put this in a loop using your application logic, then this can be used for automation. Change the limit number based on your needs and when it is done, remove the HasBeenUpdated column.

This should allow you to minimize the impact of the update on the table and database, and allow you to perform it across the whole table in manageable batches.

Edit: Update step 4 to filter rows to be updated via subquery, since a Limit statement cannot be used on a multi-table Update.

Upvotes: 2

Related Questions