Reputation: 113
I am new to SQL, and certainly to MySQL.
I have created a table from streaming market data named trade
that looks like
date | time |instrument|price |quantity
----------|-----------------------|----------|-------|--------
2017-09-08|2017-09-08 13:16:30.919|12899586 |54.15 |8000
2017-09-08|2017-09-08 13:16:30.919|13793026 |1177.75|750
2017-09-08|2017-09-08 13:16:30.919|1346049 |1690.8 |1
2017-09-08|2017-09-08 13:16:30.919|261889 |110.85 |50
This table is huge (150 million rows per date).
To retrieve data efficiently, I have created an index date_time_inst (date,time,instrument)
because most of my queries will select a specific date
or date range and then a time range.
But that does not help speed up a query like:
select * from trade where date="2017-09-08", instrument=261889
So, I am considering creating another index date_inst_time (date, instrument, time)
. Will that help speed up queries where I wish to get the time-series of one or a few instruments out of the thousands?
In additional database write-time due to index update, should I worry too much?
I get data every second, and take about 100 ms to process it and store in a database. As long as I continue to take less than 1 sec I am fine.
Upvotes: 0
Views: 454
Reputation: 1
Adding indexes to most columns in a database table is generally not a good practice unless there is a specific need.
Upvotes: 0
Reputation: 1833
SPACE IS CHEAP. Indexes take time creating/inserting (once), but shave time retrieving (Many many times)
My experience is to create as many indexes with all the relevant fields in all orders. This way, Mysql can choose the best index for your query.
So if you have 3 relevant fields
INDEX 1 (field1,field2,field3)
INDEX 2 (field1,field3)
INDEX 3 (field2,field3)
INDEX 4 (field3)
The first index will be used when all fields are present. The others are for shorter WHERE conditions.
Unless you know that some combinations will never be used, this will give MySQL the best chance to optimize your query. I'm also assuming that field1
is the biggest driver of the data.
Upvotes: 0
Reputation: 142528
Typo?
I assume you meant
select * from trade where date="2017-09-08" AND instrument=261889
^^^
Optimal index for such is
INDEX(instrument, date)
And, contrary to other Comments/Answers, it is better to have the date
last, especially if you want more than one day.
Splitting date and time
It is usually a bad idea to split date and time. It is also usually a bad idea to have redundant data; in this case, the date is repeated. Instead, use
WHERE `time` >= "2017-09-08"
AND `time` < "2017-09-08" + INTERVAL 1 DAY
and get rid of the date
column. Note: This pattern works for DATE
, DATETIME
, DATETIME(3)
, etc, without messing up with the midnight at the end of the range.
Data volume?
150M rows? 10 new rows per second? That means you have about 5 years' data? A steady 10/sec insertion rate is rarely a problem.
Need to see SHOW CREATE TABLE
. If there are a lot of indexes, then there could be a problem. Need to see the datatypes to look for shrinking the size.
Will you be purging 'old' data? If so, we need to talk about partitioning for that specific purpose.
How many "instruments"? How much RAM? Need to discuss the ramifications of an index starting with instrument
.
The query
Is that the main SELECT
you use? Is it always 1 day? One instrument? How many rows are typically returned.
Depending on the PRIMARY KEY
and whatever index is used, fetching 100 rows could take anywhere from 10ms to 1000ms. Is this issue important?
Millisecond resolution
It is usually folly to think that any time resolution is not going to have duplicates.
Is there an AUTO_INCREMENT
already?
Upvotes: 0
Reputation: 1271003
First, your use case sounds like two indexes would be useful (date, instrument)
and (date, time)
.
Given your volume of data, you may want to consider partitioning the data. This involves storing different "shards" of data in different files. One place to start is with the documentation.
From your description, you would want to partition by date
, although instrument
is another candidate.
Another approach would be a clustered index with date
as the first column in the index. This assumes that the data is inserted "in order", to reduce movement of the data on inserts.
You are dealing with a large quantity of data. MySQL should be able to handle the volume. But, you may need to dive into more advanced functionality, such as partitioning and clustered indexes to get the functionality you need.
Upvotes: 0
Reputation: 7497
To get the most efficient query you need to query on a clustered index. According the the documentation this is automatically set on the primary key and can not be set on any other columns.
I would suggest ditching the date
column and creating a composite primary key on time
and instrument
Upvotes: 1
Reputation: 30849
A couple of recommendations:
date
. You can instead have one datetime
column and store timestamps in itdatetime
and instrument
columns, that will make the queries run fasterSELECT
query (i.e. always by date
first, followed by instrument
), I would suggest looking into other columnar databases (like Cassandra). You will get faster writes and reads for such structureUpvotes: 0