Siddharth Verma
Siddharth Verma

Reputation: 113

In MySql, is it worthwhile creating more than one multi-column indexes on the same set of columns?

I am new to SQL, and certainly to MySQL. I have created a table from streaming market data named trade that looks like

 date     | time                  |instrument|price  |quantity
----------|-----------------------|----------|-------|--------
2017-09-08|2017-09-08 13:16:30.919|12899586  |54.15  |8000     
2017-09-08|2017-09-08 13:16:30.919|13793026  |1177.75|750      
2017-09-08|2017-09-08 13:16:30.919|1346049   |1690.8 |1         
2017-09-08|2017-09-08 13:16:30.919|261889    |110.85 |50       

This table is huge (150 million rows per date). To retrieve data efficiently, I have created an index date_time_inst (date,time,instrument) because most of my queries will select a specific date or date range and then a time range. But that does not help speed up a query like:

select * from trade where date="2017-09-08", instrument=261889

So, I am considering creating another index date_inst_time (date, instrument, time). Will that help speed up queries where I wish to get the time-series of one or a few instruments out of the thousands? In additional database write-time due to index update, should I worry too much?

I get data every second, and take about 100 ms to process it and store in a database. As long as I continue to take less than 1 sec I am fine.

Upvotes: 0

Views: 454

Answers (6)

Siva
Siva

Reputation: 1

Adding indexes to most columns in a database table is generally not a good practice unless there is a specific need.

Upvotes: 0

Jacques Amar
Jacques Amar

Reputation: 1833

SPACE IS CHEAP. Indexes take time creating/inserting (once), but shave time retrieving (Many many times)

My experience is to create as many indexes with all the relevant fields in all orders. This way, Mysql can choose the best index for your query.

So if you have 3 relevant fields

INDEX 1 (field1,field2,field3)
INDEX 2 (field1,field3)
INDEX 3 (field2,field3)
INDEX 4 (field3)

The first index will be used when all fields are present. The others are for shorter WHERE conditions.

Unless you know that some combinations will never be used, this will give MySQL the best chance to optimize your query. I'm also assuming that field1 is the biggest driver of the data.

Upvotes: 0

Rick James
Rick James

Reputation: 142528

Typo?

I assume you meant

select * from trade where date="2017-09-08" AND instrument=261889
                                            ^^^

Optimal index for such is

INDEX(instrument, date)

And, contrary to other Comments/Answers, it is better to have the date last, especially if you want more than one day.

Splitting date and time

It is usually a bad idea to split date and time. It is also usually a bad idea to have redundant data; in this case, the date is repeated. Instead, use

WHERE `time` >= "2017-09-08"
  AND `time`  < "2017-09-08" + INTERVAL 1 DAY

and get rid of the date column. Note: This pattern works for DATE, DATETIME, DATETIME(3), etc, without messing up with the midnight at the end of the range.

Data volume?

150M rows? 10 new rows per second? That means you have about 5 years' data? A steady 10/sec insertion rate is rarely a problem.

Need to see SHOW CREATE TABLE. If there are a lot of indexes, then there could be a problem. Need to see the datatypes to look for shrinking the size.

Will you be purging 'old' data? If so, we need to talk about partitioning for that specific purpose.

How many "instruments"? How much RAM? Need to discuss the ramifications of an index starting with instrument.

The query

Is that the main SELECT you use? Is it always 1 day? One instrument? How many rows are typically returned.

Depending on the PRIMARY KEY and whatever index is used, fetching 100 rows could take anywhere from 10ms to 1000ms. Is this issue important?

Millisecond resolution

It is usually folly to think that any time resolution is not going to have duplicates.

Is there an AUTO_INCREMENT already?

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1271003

First, your use case sounds like two indexes would be useful (date, instrument) and (date, time).

Given your volume of data, you may want to consider partitioning the data. This involves storing different "shards" of data in different files. One place to start is with the documentation.

From your description, you would want to partition by date, although instrument is another candidate.

Another approach would be a clustered index with date as the first column in the index. This assumes that the data is inserted "in order", to reduce movement of the data on inserts.

You are dealing with a large quantity of data. MySQL should be able to handle the volume. But, you may need to dive into more advanced functionality, such as partitioning and clustered indexes to get the functionality you need.

Upvotes: 0

ste-fu
ste-fu

Reputation: 7497

To get the most efficient query you need to query on a clustered index. According the the documentation this is automatically set on the primary key and can not be set on any other columns.

I would suggest ditching the date column and creating a composite primary key on time and instrument

Upvotes: 1

Darshan Mehta
Darshan Mehta

Reputation: 30849

A couple of recommendations:

  • There is no need to store date and time separately if time corresponds to time of the same date. You can instead have one datetime column and store timestamps in it
  • You can then have one index on datetime and instrument columns, that will make the queries run faster
  • With so many inserts and fixed format of SELECT query (i.e. always by date first, followed by instrument), I would suggest looking into other columnar databases (like Cassandra). You will get faster writes and reads for such structure

Upvotes: 0

Related Questions