Nigel Polack
Nigel Polack

Reputation: 33

Only Mysql OR mysql+sqlite OR mysql+own solution

Currently I am building quite big web system and I need strong SQL database solution. I chose Mysql over Postgres because some of tasks needs to be read-only (MyISAM engine) and other are massive-writes (InnoDB).

I have a question about this read-only feature. It has to be extremely fast. User must get answer a lot less than one second. Let say we have one well-indexed table named "object" with not more than 10 millions of rows and another one named "element" with around 150 millions of rows. We also have table named "element_object" containing information connecting objects from table "element" with table "object" (hundreds of millions of rows)

So we're going to do partitioning on tables "element" and "element_object" and have 8192 tables "element_hash_n{0..8191}a" and 24576 of tables "element_object_hash_n{0..8191}_m{0..2}".

An Answer on user's question would be a 2-step searching:

  1. Find id of element from tables "element_hash_n"
  2. Do main sql select on table "object" and join with table "element_object..hash_n_m" to filter result with found (from first step) ID

I wonder about first step: What would be better:

  1. store (all) over 32k tables in mysql
  2. create one sqlite database and store there 8192 tables for first step purpose
  3. create 8192 different sqlite files (databases)
  4. create 8192 files in file system and make own binary solution to find ID.

I'm sorry for my English. Its not my native language.

Upvotes: 3

Views: 130

Answers (1)

Johnny Graber
Johnny Graber

Reputation: 1467

I think you make way to many partitions. If you have more than 32000 partitions you have a tremendous overhead of management. Given the name element_hash_* it seams as if you want to make a hash of your element and partition it this way. But a hash will give you a (most likely) even distribution of the data over all partitions. I can't see how this should improve performance. If your data is accessed over all those partitions you don't gain anything by having partitions in size of your memory - you will need to load for every query data from another partition.

We used partitions on a transaction systems where more than 90% of the queries used the current day as criteria. In such a case the partition based on days worked very well. But we only had 8 partitions and moved the data then off to another database for long time storage.

My advice: Try to find out what data will be needed that fast and try to group it together. And you will need to make your own performance tests. If it is so important to deliver data that fast there should be enough management support to build a decent test environment. Maybe your test result will show that you simply can't deliver the data fast enough with a relational database system. If so you should look at NoSQL (as in Not only SQL) solutions.

In what technology do you build your web system? You should test this part as well. A super fast database will not help you much if you lose the time in a poorly performing web application.

Upvotes: 2

Related Questions