Aries On The Cusp
Aries On The Cusp

Reputation: 419

Update with join takes 9 to 30 hours for some reason?

I have two tables, lets called them table_A and table_B. Table_A has about 10 million rows. Table_B has 300 million rows. I created indexes on Table_B columns X and Y.

Is it normal that the updates would take this long with the number of rows I have ? This seems extremely long to me considering I have an index

Here is what they look like

Table A:

ID   BCODE 
1    A1 
2    B1  
3    C1  
4    D1  
5    F1 

Table B:

X    Y    IDX  IDY
A1   D1  
D1   F1  
C1   B1  

Table_B has columns 'X' and 'Y' that have values that are found in Table_A's column BCODE.

I run two update statements that take 9 to 30 hours to complete.

 Update Table_B
join table_A a on table_B.X = a.BCODE
set
   table_B.IDX=a.Id ;

Update Table_B
join Table_A aa on table_B.Y = aa.BCODE
set
   table_B.IDY = aa.Id  ;

Is there anyway I can speed this up ? I should add that the BCODE, X, Y columns can be up to 300 characters long

Here are the results of Explain:

+----+-------------+-----------+------+---------------+-------------+---------+--------------+----------+-------------+
| id | select_type | table     | type | possible_keys | key         | key_len |  ref         | rows     | Extra       |
+----+-------------+-----------+------+---------------+-------------+---------+--------------+----------+-------------+
|  1 | SIMPLE      | a         | ALL  | BCODE         | NULL        | NULL    | NULL         | 10238784 | NULL        |
|  1 | SIMPLE      | table_B   | ref  | relateIndex   | relateIndex | 632     | test.a.BCODE |       15 | Using where |
+----+-------------+-----------+------+---------------+-------------+---------+------------+----------+-------------+
2 rows in set (0.00 sec)

Upvotes: 1

Views: 77

Answers (2)

Bohemian
Bohemian

Reputation: 425258

Create a covering index:

create index table_a_bcode_id on table_A(bcode, id);

Covering indexes provide the looked up value(s) in the index, avoiding access to the table - making an index-only query possible.

Your queries look OK, but try combining them:

update Table_B
left join table_A a1 on a1.BCODE = table_B.X
left join table_A a2 on a2.BCODE = table_B.Y
set table_B.IDX = a1.Id
    table_B.IDY = a2.Id

This avoids having to update the row twice.

Upvotes: 4

Ashalynd
Ashalynd

Reputation: 12573

From the EXPLAIN output, it looks like Table_A.BCODE does not have an index on BCODE. If you want to do a JOIN, both tables should be indexed on the JOIN fields, otherwise the non-indexed table will slow things down because of the O(N) complexity of lookups.

Apart from that, you have mentioned that both columns are strings up to 300 characters long. Indices on such columns are not very effective. If you could find a way to hash the values in these columns and then perform join by hashed values, it could speed things up.

On the other side, with tables that large, you should check if the indices even fit into memory. If they don't, then MySQL will start swapping your index to disk, which would tremendously slow things down.

We can do an estimation: 600 (your index key length) x 300M (number of rows) = 200 GB... So quite likely this is the issue.

Possible workaround: try partitioning your table_B (e.g.by PK) so that for every part the index could fit into memory (it could mean 25 partitions or more, assuming your MySQL instance could use up to 8 GB RAM).

Upvotes: 2

Related Questions