SQL database: use one table with 3,5M entries or many tables with less entries?

Question

I'm building a football game that allows people to create leagues online.

Say I have 1000 league after some time. I store my data in SQL. Should I:

1) Create 1 single database with the table "rosters" and insert all the players of all the leagues into it (which would approximate 3,5 to 4 million entries for 1000 leagues)

2) Create 1 single database with 20 tables "rosters1, rosters2, etc" and split the leagues rosters between those 20 tables

3) Create 1 database for each league with a table "rosters" in each

4) Create 20 databases with one table "rosters" in each and split the leagues rosters between the 20 databases.

What is the best option for performance and quick SQL queries?

JasonTrue · Accepted Answer

It sounds like you need to learn a few database basics, first, so please spend some time making sure you understand Database Normalization and indexes.

However, in principal, you probably need a "players" table (player id, name, etc), a "roster" table (roster id, name, maybe owner_id), and a player_roster_map (player_id, roster_id) that links the two. You'll probably want a unique constraint on the composite value of player_id and roster_id, and you should have a foreign key constraint on both player_id and roster_id in that mapping table.

(I'm presuming that players are not really unique per roster; you can still model similarly if there are attributes that are overridden by the person that owns them, but I'm guessing based on my understanding of your description of the model.)

Your database engine will probably index anything that you mark as a primary key by default, and may index anything that you mark as a foreign key, but this is database dependent, and I am not particularly an expert on MySql; you'll need to do some research.

Proper indexes will help you when you execute queries, at a generally small cost in terms of insert performance. If you end up with large numbers of identical values or a high utilization of queries in which only a small range of values (such as time ranges), you may eventually need to understand partitioning, as well. Partitioning allows you to get the benefits of dealing with fairly large number of records without the modeling problems that denormalization would introduce. However, don't prematurely take this step; you'll probably find that a correct database design will be enough with the size of dataset that you're implying.

SQL database: use one table with 3,5M entries or many tables with less entries?

Answers (2)

Related Questions