Reputation: 1869
I tested an update between two large (~5 mil records each) which was taking 10 seconds or so per update. So, doing Explain for my very first time tested the select:
SELECT
T1.Z, T2.Z
FROM
TableB T1
INNER JOIN TableL T2
on T1.Name=T2.Name
and T1.C=T2.C
and T1.S=T2.S
and T1.Number>=T2.MinNumber
and T1.Number<=T2.MaxNumber
Explain returned the following as possible keys:
and chose C as the key.
I was told that my best bet was to make a compound key, and in the order of the select so I did
Alter Table TableB Add Index Compound (Name,C,S,Number)
And did an explain again, hoping it would choose my compound but now even though it shows the compound index as a possible key it still chooses Index C.
I read that I can force the index I want with:
SELECT
T1.Z, T2.Z
FROM TableB T1 Force Index(Compound)
INNER JOIN TableL T2
on T1.Name=T2.Name
and T1.C=T2.C
and T1.S=T2.S
and T1.Number>=T2.MinNumber
and T1.Number<=T2.MaxNumber
yet I am not sure if it makes any sense to over-ride MySql's selection and, given that if it doesn't help the update is going to take almost two years it doesn't seem like a smart thing to test.
Is there some step I am missin? Do I need to remove the other keys so that it chooses my compound one and if so how will I know if it will even make a difference (given that Mysql saw it and rejected it)?
Explain output on T1: (note: I did not yet add the Compound Index as the table is huge and it might be wasted time until I figure this out. I previously added it on a highly truncated version of the table but that won't help with this explain) Table1
Explain for Table2
Cardinality (only showing indexes relevant here as there are a few others):
Primary: 5139680
Name: 1284920
Number: 57749
C: 7002
S: 21
Upvotes: 0
Views: 118
Reputation: 142298
Note that the first table (whichever one it is) must be fully scanned. So, the best we can do is to have a good index on the second table.
The optimal index (as already noted) for T1 is (Name,C,S,Number)
. For T2 it is (Name,C,S,MinNumber,MaxNumber)
, which is bulkier.
The optimizer seems to want to start with T1; perhaps it is slightly smaller. Let's force it to start with T2 by changing INNER JOIN
to STRAIGHT_JOIN
and swapping the order:
SELECT
T1.Z, T2.Z
FROM TableL T2 -- note
STRAIGHT_JOIN TableB T1 -- note
on T1.Name=T2.Name
and T1.C=T2.C
and T1.S=T2.S
and T1.Number>=T2.MinNumber
and T1.Number<=T2.MaxNumber
Then, let's do one more optimization: If Z
is not 'too big', let's include it at the end of the index so that it becomes a "Covering index":
INDEX(Name,C,S,Number,Z)
(Name, C, S
can be in any order, but Number, Z
must be in that order and at the end.) If you currently have INDEX(Name)
, DROP
it as being redundant.
Then the EXPLAIN
will say that you are doing a full table scan of T2, plus a "Using index" on T1.
Please provide SHOW CREATE TABLE
; there may be more optimizations.
Upvotes: 0
Reputation: 1869
So based on some great comments/input I came up with a solution. One flashbulb input from Paul Spiegel was that trying to join two 5+mil tables using several VarChar fields was not recommended.
So what I did was create a UniqueTable with ID and UnqiueRecord Fields.
I then made the UniqueRecord a Unique Index.
I inserted into that table from Both TableA and TableB as:
Insert IGNORE into `Unique` (UniqueRecord)
Select Concat(Name,C,S) from Table1 Group by Name,C,S;
Insert IGNORE into `Unique` (UniqueRecord)
Select Concat(Name,C,S) from Table2 Group by Name,C,S
This gave me unique records from both within and between the two tables.
I then added a UniqeRecord_ID field to both Table1 and Table 2.
I then did a join between each table and the UniqueRecord to write the UniqueRecord ID to each table:
Update Table1 as T1
Inner Join Unique as T2
On Concat(T1.Name,T1.S,T1.C) = T2.UniqueRecord
Set T1.UniqueRecord_ID=T2.ID
Finally, I added a key to each table on UniqueRecord_ID.
My Explain showed that it only used that key from T2 however whereas it was taking 10 seconds per record for the select prior (I tested on 1,10,100 and stopped there as I did not have the requisite 578 days to test the whole table :| ) the entire select, returning close to 5 million records took 72 seconds.
Upvotes: 3