Reputation: 1519
I have a few tables with more than 100+ millions of rows. I get about 20-40 millions of rows each month.
At this moment everything seems fine: - all inserts are fast - all selects are fast ( they are using indexes and don't use complex aggregations )
However, I am worried about two things, what I've read somewhere: - When a table has few hundred millions of rows, there might be slow inserts, because it might take a while to re-balance the indexes ( binary trees ) - If index doesn't fit into memory, it might take a while to read it from the different parts of the disk.
Any comments would be highly appreciated. Any suggestions how can I avoid it or how can I fix/mitigate the problem if/when it happens would be highly appreciated.
( I know we should start doing a sharding at some day )
Thank you in advance.
Upvotes: 35
Views: 33515
Reputation: 142528
I will attempt to address the points being made by the OP and the other responders. The Question only touches the surface; this Answer follows suit. We can dig deeper in more focused Questions.
DROP PARTITION
is a lot faster than DELETEing
a zillion rows.)INSERTs
with an AUTO_INCREMENT
PRIMARY KEY
will 'never' slow down. This applies to any temporal key and/or small set of "hot spots". Example PRIMARY KEY(stock_id, date)
is limited to as many hot spots as you have stocks.INSERTs
with a UUID PRIMARY KEY
will get slower and slower. But this applies to any "random" key.AUTO_INCREMENT
may be less than optimal.OPTIMIZE TABLE
.)innodb_buffer_pool_size
, which should (for starters) be about 70% of available RAM.If you want to discuss your specific application, let's see some details. Different apps need different techniques.
My blogs, which provide more details on many of the above topics: https://mysql.rjweb.org
Upvotes: 11
Reputation: 211730
Today is the day you should think about sharding or partitioning because if you have 100MM rows today and you're gaining them at ~30MM per month then you're going to double the size of that in three months, and possibly double it again before the year is out.
At some point you'll hit an event horizon where your database is too big to migrate. Either you don't have enough working space left on your disk to switch to an alternate schema, or you don't have enough down-time to perform the migration before it needs to be operational again. Then you're stuck with it forever as it gets slower and slower.
The performance of write activity on a table is largely a function of how difficult the indices are to maintain. The more data you index the more punishing writes can be. The type of index is all relevant, some are more compact than others. If your data is lightly indexed you can usually get away with having more records before things start to get cripplingly slow, but that degradation factor is highly dependent on your system configuration, your hardware, and your IO capacity.
Remember, InnoDB, the engine you should be using, has a lot of tuning parameters and many people leave it set to the really terrible defaults. Have a look at the memory allocated to it and be sure you're doing that properly.
If you have any way of partitioning this data, like by month, by customer, or some other factor that is not going to change based on business logic, that is the data is intrinsically not related, you will have many simple options. If it's not, you'll have to make some hard decisions.
The one thing you want to be doing now is simulating what your table's performance is like with 1G rows in it. Create a sufficiently large, suitably varied amount of test data, then see how well it performs under load. You may find it's not an issue, in which case, no worries for another few years. If not, start panicking today and working towards a solution before your data becomes too big to split.
Database performance generally degrades in a fairly linear fashion, and then at some point it falls off a cliff. You need to know where this cliff is so you know how much time you have before you hit it. The sharp degradation in performance usually comes when your indexes can't fit in memory and when your disk buffers are stretched too thin to be useful.
Upvotes: 43