Ian
Ian

Reputation: 354

About the sql performance of select ... in

Mysql 5.7.21

I use pool to connect database and run the SQL

let mysql = require('mysql');
let pool = mysql.createPool(db);
pool.getConnection((err, conn) => {
      if(err){
        ...
      }else{
        console.log('allConnections:' + pool._allConnections.length);
        let q = conn.query(sql, val, (err, rows,fields) => {
        ...

I have a table with around 1,000,000 records. I wrote a select to fecth the records.

select * from tableA where trackingNo in (?)

I will send the trackingNo via array param. The amount of trackingNo is around 20000. It means the length of array is around 20000.

And I made the index to trackingNo column. (trackingNo column is varchar type, not unique, can be null, blank and all possible values)

The problem is, I find it will cost around 5 minutes to get the results! 5 minutes here means purely backend sql handling time. I think it is too slow to match 20000 records in 1,000,000 records. Do you have any suggestion for select.. in ?

Explain SQL:

id  select_type table   partitions  type    possible_keys          key    key_len   ref   rows   filtered   Extra
1   SIMPLE      tableA  null        ALL     table_tracking_no_idx  null   null      null  999507    50      Using where

Upvotes: 0

Views: 65

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

MySQL creates a binary search tree for IN lists that are composed of constants. As explained in the documentation:

If all values are constants, they are evaluated according to the type of expr and sorted. The search for the item then is done using a binary search. This means IN is very quick if the IN value list consists entirely of constants.

In general, creating a separate table with constants does not provide much improvement in performance.

I suppose there could be some subtle issue with type compatibility -- such as collations -- that interferes with this process.

This type of query probably requires a full table scan. If the rows are wide, then the combination of the scan and returning the data may be accounting for the performance. I do agree that five minutes is a long time, but it could be entirely due to the network connection between the app/GUI and the database.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521249

You could consider populating a table with the tracking numbers you want to match. Then, you could use an inner join instead of your current WHERE IN approach:

SELECT *
FROM tableA a
INNER JOIN tbl b
    ON a.trackingNo = b.trackingNo;

This has the advantage that you may index the new tbl table on the trackingNo column to make the join lookup extremely fast.

This assumes that tbl would have a single column trackingNo which contains the 20K+ values you need to consider.

Upvotes: 2

Related Questions