John Tatu
John Tatu

Reputation: 525

Slow MySQL query on update statement

I am trying to move some data from a database to another. I am currently having over a million entries in my database and I was expecting this to take long but already passed 50min and no result :) . Here is my query:

UPDATE xxx.product AS p 
LEFT JOIN xx.tof_art_lookup AS l ON p.model_view = l.ARL_SEARCH_NUMBER 
SET p.model = l.ARL_DISPLAY_NR 
WHERE p.model_view = l.ARL_SEARCH_NUMBER;

Any help on how to improve this query will be welcome. Thanks in advance!

Upvotes: 0

Views: 583

Answers (2)

Jason Heo
Jason Heo

Reputation: 10236

Identify what makes slow.

check JOIN is optimized

run SELECT only:

SELECT COUNT(*)
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l 
  ON p.model_view = l.ARL_SEARCH_NUMBER;

how long takes? and EXPLAIN SELECT ... check proper INDEX is used for JOIN.

If everything is fine for JOIN, then UPDATEing row is slow. this situation is hard to make things faster.

UPDATE = DELETE and INSERT

I didn't tried this. but sometimes, this strategy is faster.. UPDATE is DELETE old row and INSERT new row using new value.

// CREATE new table and INSERT
CREATE TABLE xxx.new_product
SELECT p.model_model, l. ARL_DISPLAY_NR, ... 
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l 
  ON p.model_view = l.ARL_SEARCH_NUMBER;

// drop xxx.procuct
// rename xxx.new_product to xxx.product

divide table into small chunk, and run concurrently

I think your job is CPU bounded and your UPDATE query uses just one CPU can't have benefit many cores. xxx.product TABLE has no constraint for join, there for 1M rows are updated sequencially

My suggestion following.

give some conditions to xxx.product so that xxx.product divided 20 group. (I don't no which column would be better for you, as I have no information about xxx.product)

then run 20 queries at once concurrently.

for example:

// for 1st chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val1 AND val2; <= this condition spliting xxx.product

// for 2nd chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val2 AND val3;

...
...

// for 20th chunk
UPDATE xxx.product AS p 
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
  AND p.column BETWEEN val19 AND val20;

It is important to find BETWEEN value distribute table evenly. Histogram may help you. Getting data for histogram plot

Upvotes: 1

LINQ2Vodka
LINQ2Vodka

Reputation: 3036

Indexes on p.model_view, l.ARL_SEARCH_NUMBER if you not gonna get rid of JOINs.
Actually, it might be optimized depending on actual data amounts and their values (NULLs presence) by use of:
1. Monitoring query execution plan and , if it's not good, putting query hints for compiler or exchange JOINs for subqueries so compiler uses another type of join inside it (merge/nested loops/hashs/whatever)
2. Making a stored procedure with more comlicated but faster logic
3. Doing updates by small portions

Upvotes: 2

Related Questions