shabby
shabby

Reputation: 3222

Best way to deal with Big data in mysql

Current Design

Previously my co-worker designed a databse which had tables like customer_0, customer_1.... to customer_9 wherby all the customer ids are split into 10 different tables based on the last digit of the id.

The Problem with this design:

  1. I dont find it to be a standard practice
  2. To deal with it, you always have to create queries as strings, be they in stored procedures or in code, where you pass in the id and the query is created at runtime extracting the last digit of id and then choosing the table to query from.
  3. To apply a foreign key constraint you need to have referenced tables split (I wouldnt use the term partitioned here because this type of splitting is not partitioning) in the same fashion even if they are not intended to store huge data, for e.g. customer_sales tables have to be split into 10 parts as well since you have to apply the foreign key constraints. (a customer has one-to-many relation with custoemr_sales)

My Design

On tring to figure out a work-around I came to know that you can do table partitioning which exactly solved my problem. Refereing to this question.

The Prblem with partitioning approach

Now the problem with this approach is that you cannot have a foreign key constraint anyway in partitions, so this doesnt solve the problem.

DB Sharding or "Shared Nothing"

Then I came across this, in which you use schema replication and what I understand is replicate schemas on different physical locations, and thus the respective schema is queried based on the specific shard key.

My Question

What should I do now, I cannot let go of the foreign key constraint, choosing table partitioning. Should I let go of all the partitioning and sharding and just focus on conventional schema, and leave the sharding part to the DBA?

Note: max expected customer base is 10 million.

Upvotes: 1

Views: 2974

Answers (1)

Ray
Ray

Reputation: 41478

Yes, let go of partitioning and sharding for now--stick with a conventional, simple schema. You've probably got many easier fruit to pick that can get you the performance needs with the ability to have FK constraints at the data size you note.

All the 'sharding' you're doing seems like someone took a swing at premature optimizations for a future that's not even expected if all you're growing is to 10 Million customers/record.

Also, I really wouldn't classify your situation as 'Big Data', despite how the term gets thrown around everywhere.

Assuming a table with a reasonable number of columns, say less than 30 columns of less than 32 bytes each (char(32)), 10 million rows is nothing for Mysql to handle when properly indexed and given adequate memory to hold the innodb tables in memory (I'm assuming you're using innodb). I'm currently working with tables a factor of 10x that size on a AWS xlarge RDS instance without any issues outside the occassional time it takes to do a sql dump or time it takes to do table alterations.

I'd combine all the various customer tables into a single table and take a long look at all your queries that hit it. Run explain's on them to see where you really need indexes. Keep the FK constraints as you need them and makes sure you have suitable covering indexes as need.

I doubt you need table partitioning to get good performance on the size of data you've indicated.

Upvotes: 3

Related Questions