reefine
reefine

Reputation: 853

MySQL Join Best Practice on Large Data

table1_shard1 (1,000,000 rows per shard x 120 shards)

 id_user   hash

table2 (100,000 rows)

 value    hash

Desired Output:

 id_user  hash    value

I am trying to find the fastest way to associate id_user with value from the tables above.

My current query ran for 30 hours without result.

SELECT 
    table1_shard1.id_user, table1_shard1.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard1.hash=table2.hash 
GROUP BY id_user
UNION
SELECT 
    table1_shard2.id_user, table1_shard2.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard2.hash=table2.hash 
GROUP BY id_user
UNION 
( ... )
UNION 
SELECT 
    table1_shard120.id_user, table1_shard120.hash, table2.value 
FROM table1_shard1 
LEFT JOIN table2 ON table1_shard120.hash=table2.hash 
GROUP BY id_user

Upvotes: 1

Views: 1751

Answers (1)

Pierre de LESPINAY
Pierre de LESPINAY

Reputation: 46208

Firstly, do you have indexes on the hash fields

I think you should merge your tables in one before the query (at least temporarily)

CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard1;

CREATE TEMPORARY TABLE IF NOT EXISTS tmp_shards
SELECT * FROM table1_shard2;

# ...

Then do the main query

SELECT
  table1_shard120.id_user
, table1_shard120.hash
, table2.value
FROM tmp_shards AS shd
LEFT JOIN table2 AS tb2 ON (shd.hash = tb2.hash)
GROUP BY id_user
;

Not sure for the performance gain but it'll be at least more maintainable.

Upvotes: 1

Related Questions