Reputation: 7479
I'm interested to build a huge database (100s millions of records) using MySQL, to contain stock data in 1-min intervals. The database will contain data for 5000 stocks say for 10 years.
Two concerns:
(1) In the past, I had a problem of "slow insertions" -- meaning, at the beginning the rate of insertions was good, but as the table was filling up with millions of records, the insertion became slow (too slow!). At that time I used Windows and now I use Linux -- should it make a difference?
(2) I am aware of indexing techniques that will help queries (data retrievals) be faster. The thing is, is there a way to speed-up insertions? I know one can turn off indexing while inserting, but then 'building' the indexes post insertion (for 10s of millions of records!) also takes tons of time. any advice on that?
Any other Do's / Don'ts? Thanks in advance for any help.
Upvotes: 3
Views: 930
Reputation: 25205
Consider using an SSD drive (or array) to store your data, especially if you can't afford to create a box with gigs of memory. Everything about it should be faster.
Upvotes: 0
Reputation: 1665
It depends on what type of index you need and how you generate data. If you are satisfied with single index on time, just stick to that and when you generate data, keep on inserting in ascending order (with respect to the insert time for which you have the index). That way, the reordering required is minimal during insertion. Also, consider partitioning to optimize your queries. It can give you dramatic improvements in performance. Using auto-increment column can help for fast indexing, but then you won't have the index on time if auto-increment column is the only index. Make sure you use innodb storage engine for good performance. If you properly tune your database engine on Linux and keep the design simple, it will smoothly scale without much issues. I think the huge data requirements you talk about is not as difficult to build as it might seem first. However, if you are planning to run aggregate queries (with joins of tables), then that is more challenging.
Upvotes: 2
Reputation: 41
You could always keep your data in a table with no indexes and then use Lucene (or similar) to index the data. This will keep inserts fast and allow you to query Lucene for fast data retrieval.
Upvotes: 0