Reputation: 525
I am trying to move some data from a database to another. I am currently having over a million entries in my database and I was expecting this to take long but already passed 50min and no result :) . Here is my query:
UPDATE xxx.product AS p
LEFT JOIN xx.tof_art_lookup AS l ON p.model_view = l.ARL_SEARCH_NUMBER
SET p.model = l.ARL_DISPLAY_NR
WHERE p.model_view = l.ARL_SEARCH_NUMBER;
Any help on how to improve this query will be welcome. Thanks in advance!
Upvotes: 0
Views: 583
Reputation: 10236
Identify what makes slow.
run SELECT only:
SELECT COUNT(*)
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l
ON p.model_view = l.ARL_SEARCH_NUMBER;
how long takes? and EXPLAIN SELECT ...
check proper INDEX
is used for JOIN.
If everything is fine for JOIN, then UPDATEing row is slow. this situation is hard to make things faster.
I didn't tried this. but sometimes, this strategy is faster.. UPDATE
is DELETE old row and INSERT new row using new value.
// CREATE new table and INSERT
CREATE TABLE xxx.new_product
SELECT p.model_model, l. ARL_DISPLAY_NR, ...
FROM xxx.product p LEFT JOIN xx.tof_art_lookup l
ON p.model_view = l.ARL_SEARCH_NUMBER;
// drop xxx.procuct
// rename xxx.new_product to xxx.product
I think your job is CPU bounded and your UPDATE query uses just one CPU can't have benefit many cores. xxx.product TABLE has no constraint for join, there for 1M rows are updated sequencially
My suggestion following.
give some conditions to xxx.product so that xxx.product divided 20 group. (I don't no which column would be better for you, as I have no information about xxx.product)
then run 20 queries at once concurrently.
for example:
// for 1st chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val1 AND val2; <= this condition spliting xxx.product
// for 2nd chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val2 AND val3;
...
...
// for 20th chunk
UPDATE xxx.product AS p
...
WHERE p.model_view = l.ARL_SEARCH_NUMBER
AND p.column BETWEEN val19 AND val20;
It is important to find BETWEEN value distribute table evenly. Histogram may help you. Getting data for histogram plot
Upvotes: 1
Reputation: 3036
Indexes on p.model_view, l.ARL_SEARCH_NUMBER if you not gonna get rid of JOINs.
Actually, it might be optimized depending on actual data amounts and their values (NULLs presence) by use of:
1. Monitoring query execution plan and , if it's not good, putting query hints for compiler or exchange JOINs for subqueries so compiler uses another type of join inside it (merge/nested loops/hashs/whatever)
2. Making a stored procedure with more comlicated but faster logic
3. Doing updates by small portions
Upvotes: 2