Sergei Illarionov
Sergei Illarionov

Reputation: 753

Speed up SQL query with huge amount of data

I have this SQL query

SELECT * 
FROM `storages_remains` 
WHERE `FREE` > 0 
  AND `CATALOG_ID` IN (...)

CATALOG_ID is not unique, because table may contains multistorages data.

CATALOG_ID + STORAGE_ID are unique, BUT I can't specify certain storage in that query, it should work for all.

FREE is an integer value of storages remains.

What can I do to optimize this SQL query?

I have an idea to cache something like map

product_id => storage_id => is_avail:bool

is are there any SQL solutions?

From the asker's comment, note 5000 values in the IN:

Table contains about i 1 billion rows in average. Its updates everyday. I expect about 2thousand in average output. Currently when i query with 5 thousands of product its take about 300ms. In in clause will be about 1-5 thousands ids

Currently table have following indexes enter image description here

All indexes was added by another programmers and i cant clearly understand how does free index works becouse FREE its just a number.

Is any way to create logical virtual column like avail = free>0 : true :false?

Currently query for example

EXPLAIN SELECT * FROM `s4y_catalog_storages_remains` WHERE `FREE`>0 AND CATALOG_ID IN('0d7bd57b-878a-11ed-96e5-001b21e8ace8','fa2f272f-ca6d-11ea-809e-a0369f4567aa','7661e585-5c7d-11e5-a9c4-a0369f4567aa')

Returns following info enter image description here

Explain plan from sequlize row query:

 [
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ],
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ]
]

Upvotes: 0

Views: 258

Answers (1)

tpdi
tpdi

Reputation: 35171

  1. Run showplan, then put an index on catalog_id. Run showplan and see if it's better. If not, try...

  2. Put an index on catalog_id and free. Run showplan. If not, try...

  3. If many rows are free = 0, put an index on catalog_id and free > 0. Run showplan. (This is basically the cache you wanted.) If not, try...

  4. IN often results in multiple index lookups or falling back to a table scan. You have an IN with 5000 ids? Yeah, you're either doing 5000 queries or a table scan. And that's huge, IN isn't meant to be used that way, and it may actually be slowing down at the query parser. Sometimes a join is faster. Create a table of the catalog ids in your IN, then join to that table in your query. Note that you'll need the index from solution 1. Run showplan.

Upvotes: 0

Related Questions