Reputation: 31
You have a very large table – over 500 million records. The table is totally normalized. The table is an innodb table. Queries are slower than acceptable even though they are optimized as much as possible. Even though the table is already slow, it is projected that over the next 6 months it will double in size. What things would you consider looking into to address the current performance problem and allow for a potential quadrupling for the data in the table?
I have learned that if queries are slow, the problem is coming from either processing power, RAM, Disc or number of servers. Can you tell in child language how cloud computing or load balancing or adding RAM/CPU power/Disc help increase query response time from 7 seconds to 1 second for such a large number of rows? Lets say you have X servers and Y RAM and Z Discs of C capacity which gives me S1 seconds query response time. How can you vary X,Y,Z,C to increase/decrease S1 by 1 second ?
Upvotes: 1
Views: 4704
Reputation: 142298
"The table is totally normalized" -- Since there is such a thing as "over-normalization", let's see the SHOW CREATE TABLE
for the tables.
INSERT
performance."I have learned that if queries are slow, the problem is coming from either processing power, RAM, Disc or number of servers." -- That's an old wives' tale. There are usually ways to improve the indexing and/or formulation of the queries and/or the schema (as mentioned above).
Are you familiar with 'composite' indexes?
"Can you tell in child language how cloud computing or load balancing or adding RAM/CPU power/Disc help increase query response time from 7 seconds to 1 second for such a large number of rows?" Answer: "None of those will help." MySQL performs one query in one CPU, and the I/O on the server is also likely to be single-threaded. Parallelism (that you allude to) does not exist in MySQL; when it does, it is up to the user to write the code, then bemoan that it did not help as much as expected.
"then its better to migrate those data to MongoDB or any other NoSQL db" -- You are missing the point. If you need to read 500M rows (or even 1M), it takes time. There is no magic bullet to make I/O faster.
Sorry to be vague, but there are dozens of principles that can significantly speed up working with 500M rows.
A big help in Data Warehousing is "Summary Tables". They often make things 10 times as fast. But they require you to build and maintain them. (Again, I am being vague due to lack of specifics about your use case.)
"for 99% of cases that land on stackoverflow, isn't the case" -- Maybe only 98%.
About the only hardware fix that can give you 2x speedup is to replace a spinning drive with SSD. CPUs have not improved by much in 18 years. 64 cores helps when you have 64 connections, but not when you timing the latency of 1 connection. Sharding is best done when the data needed can be separated across multiple servers.
Upvotes: 2
Reputation: 3819
I would suggest enabling the slow query log and starting with logging queries that need longer than 5 seconds. Queries from the log should be performance analysed. After this turn another round for 4,3,2,1 second. Don't forget to switch of the log after doing this analyse.
If you are still slow, then you could think about your hardware - is it a slow san, or an ordinary harddisk, or a sd? After this you could think about your ram ... do you need more, because the system swaps all the time? Last but not least think about your cpu ... but maybe you are on a raspbery pi - which is normally slow ;-)
Upvotes: 0