Reputation: 110257

More performant -- two tables or one

I need to make a comparison of several million titles, to see the intersection and difference.

The data looks something like this:

Title        Platform
Terminator    1
Terminator    2
Abyss         1

I want to find out things such as "Show me all titles that are on Platform 1, but not 2 ("The Abyss")", "Show me all titles that are on both platforms ("Terminator"), Show me all titles that are on Platform 2 but not Platform 1 (NULL).

I only have two platforms to compare, 1 and 2. Should I store all the data in one table, or store it in two separate tables with the same structure? Which would provide better performance?

Upvotes: 3

Answers (2)

Gordon Linoff

Reputation: 1269933

If those are the three queries that you have to answer, and you have to answer them quickly, then I would suggest the less normalized route:

Title
IsPlatform1
IsPlatform2

(Of course, this is less of a possibility if there is additional information for each platform.)

With multiple rows per title, you will have to use either a group by or join in order to bring the data together to answer your sample questions. With the data in one row, you can readily answer these questions using index lookups.

The "downside" -- if it is one -- is that an operation that was an insert would now be an update. Actually, MySQL has the very convenient insert . . . on duplicate key update syntax to handle this situation very well.

If your data structures are actually much more complicated than in the question, this might not be feasible. My attitude toward denormalization is basically to keep the data normalized, unless there is a good reason. Your three sample queries provide a good reason, assuming that performance needs to be optimized for them.

Upvotes: 6

Patrick Hofman

Reputation: 156978

I think that mostly depends on the effective use of indexes and correct queries. Dividing tables, or using partitions to divide table data, is only a decent solution when there is really a lot of data involved.

Dividing or partitioning is useful when having (very) large rows or historic data, causing memory cache problems. You can calculate this yourself: a million rows of 1 KB needs 1 GB of memory to load the full table. If you use an index and the database has to read 'just' 100 rows, it needs 100 KB.

My 2 cents: keep it this way and check for optimization possibilities by indexes, etc.

Upvotes: 0

More performant -- two tables or one

Answers (2)

Related Questions