prranay
prranay

Reputation: 1889

Optimize SQL to fetch 1 day data

I need to fetch last 24 hrs data frequently and this query runs frequently. Since this scans many rows, using it frequently, affects the database performance.

MySql execution strategy picks index on created_at and that returns 1,00,000 rows approx. and these rows are scanned one by one to filter customer_id = 10 and my final result has 20000 rows.

How can I optimize this query?

explain SELECT  *
FROM    `order`
WHERE    customer_id = 10
and `created_at` >= NOW() - INTERVAL 1 DAY;

id : 1
select_type : SIMPLE
table : order
partitions : NULL
type : range
possible_keys : idx_customer_id, idx_order_created_at
key : idx_order_created_at
key_len : 5
ref : NULL
rows : 103357
filtered : 1.22
Extra : Using index condition; Using where

Upvotes: 0

Views: 313

Answers (5)

Rick James
Rick James

Reputation: 142433

This technique should be better than all the other answers, though perhaps by only a small amount:

Instead of orders being indexed thus:

PRIMARY KEY(order_id)   -- AUTO_INCREMENT
INDEX(customer_id, ...)  -- created_at, and possibly others

do this to "cluster" the rows together:

PRIMARY KEY(customer_id, order_id)
INDEX (order_id)   -- to keep AUTO_INCREMENT happy

Then you can optionally have more indexes starting with customer_id as needed. Or not.

Another issue -- What will you do with 20K rows? That is a lot to feed to a client, especially of the human type. If you then munch on it, can't you make a more complex query that does more work, and returns fewer rows? That will probably be faster.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270583

For this query:

SELECT o.*
FROM `order` o
WHERE o.customer_id = 10 AND
      created_at >= NOW() - INTERVAL 1 DAY;

My first inclination would be a composite index on (customer_id, created_at) -- as others have suggested.

But, you appear to have a lot of data and many inserts per day. That suggests partitioning plus an index. The appropriate partition would be on created_at, probably on a daily basis, along with an index for user_id.

A typical query would access the two most recent partitions. Because your queries are focused on recent data, this also reduces the memory occupied by the index, which might be an overall benefit.

Upvotes: 0

Jacob
Jacob

Reputation: 1936

Seems you are dealing a very quick growing table, I should consider moving this frequent query to a cold table or replica.

One more point is that did you consider partition by customer_id. I am not quite understand the business logic behind to query customer_id = 10. If it's multi tenancy application, try partition.

Upvotes: 0

DRapp
DRapp

Reputation: 48169

Instead of two single indexes on ID and Created, create a single composite index on ( customer_id, created_at ). This way the index engine can use BOTH parts of the where clause instead of just hoping to get the one. Jump right to the customer ID, then jump directly to the date desired, then gives results. it SHOULD be very fast.

Additional Follow-up. I hear your comment about having multiple indexes, but add those into the main one, just after such as

( customer_id, created_at, updated_at, completion_time )

Then, in your queries could always include some help on the index in the where clause. For example, and I don't know your specific data. A record is created at some given point. The updated and completion time will always be AFTER that. How long does it take (worst-case scenario) from a creation to completion time... 2 days, 10 days, 90 days?

where
       customerID = ?
   AND created_at >= date - 10 days
   AND updated_at >= date -1

Again, just an example, but if a person has 1000's of orders and relatively quick turn-around time, you could jump to those most recent and then find those updated within the time period.. Again, just an option as a single index vs 3, 4 or more indexes.

Upvotes: 1

The Impaler
The Impaler

Reputation: 48850

The first optimization I would do is on the access to the table:

create index ix1 on `order` (customer_id, created_at);

Then, if the query is still slow I would try appending the columns you are selecting to the index. If, for example, you are selecting the columns order_id, amount, and status:

create index ix1 on `order` (customer_id, created_at, 
  order_id, amount, status);

This second strategy could be beneficial, but you'll need to test it to find out what performance improvement it peoduces in your particular case.

The big improvement of this second strategy is that it walks the secondary index only, by avoiding to walk back to the primary clustered index of the table (that can be time consumming).

Upvotes: 1

Related Questions