Filippo oretti
Filippo oretti

Reputation: 49817

Query JOIN or not? (optimization)

i was wondering if to use 2 tables is better then using 1 single table .

Scenario:

i have a simple user table and a simple user_details table. i can JOIN tables and select both records.

But i was wondering if to merge 2 table into 1 single table.

what if i have 2milions users records in both tables?

in terms of speed and exec time is better to have a single table when selecting records?

Upvotes: 2

Views: 207

Answers (5)

mjv
mjv

Reputation: 75095

To denormalize or not to denormalize, that is the question...

There is no simple, one-size-fits all response to this question. It is a case by case decision.

In this instance, it appears that there is exactly one user_detail record per record in the user table (or possibly either 1 or 0 detail record per user record), so shy of subtle caching concerns, there is really little no penalty for "denormalizing". (indeed in the 1:1 cardinality case, this would effectively be a normalization).

The difficulty in giving a "definitive" recommendation depends on many factors. In particular (format: I provide a list of questions/parameters to consider and general considerations relevant to these):

  • what is the frequency of UPDATEs/ DELETEs / INSERTs ?
  • what is the ratio of reads (SELECTs) vs. writes (UPDATEs, DELETEs, INSERTs) ?
  • Do the SELECT usually get all the rows from all the tables, or do we only get a few rows and [often or not] only select from one table at a given time ?

If there is a relative little amount of writes compared with reads, it would be possible to create many indexes, some covering the most common queries, and hence logically re-creating of sort, in a more flexible fashion the two (indeed multiple) table setting. The downside of too many covering indices could of course be to occupy too much disk space (not a big issue these days) but also to possibly impede (to some extent) the cache. Also too many indices may put undue burden on write operations...

  • what is the size of a user record? what is the size of a user_detail record?
  • what is the typical filtering done by a given query? Do the most common queries return only a few rows, or do they yield several thousand records (or more), most of the time?

If any one of the record average size is "unusually" long, say above 400 bytes, a multi-table may be appropriate. After all, an somewhat depending on the type of filtering done by the queries, the JOIN operation are typically very efficiently done by MySQL, and there is therefore little penalty in keeping separate table.

  • is the cardinality effectively 1:1 or 1:[0,1] ?

If it isn't the case i.e if we have user records with more than one user_details, given the relatively small number or records (2 millions) (Yes, 2M is small, not tiny, but small, in modern DBMS contexts), denormalization would probably be a bad idea. (possible exception with cases where we query several dozens of time per second the same 4 or 5 fields, some from the user table, some from the user_detail table.

Bottom lines:

  • 2 Million records is relatively small ==> favor selecting a schema that is driven by the semantics of the records/sub-records rather than addressing, prematurely, performance concerns. If there are readily effective performance bottlenecks, the issue is probably not caused nor likely to be greatly helped by schema changes.
  • if 1:1 or 1:[0-1] cardinality, re-uniting the data in a single table is probably a neutral choice, performance wise.
  • if 1:many cardinality, denormalization ideas are probably premature (again given the "small" database size)
  • read about SQL optimization, pro-and-cons of indexes of various types, ways of limiting the size of the data, while allowing the same fields/semantics to be recorded.
  • establish baselines, monitor the performance frequently.

Upvotes: 2

Phil Sandler
Phil Sandler

Reputation: 28016

You should easily be able to make either scenario perform well with proper indexing. Two million rows is not that many for any modern RDBMS.

However, one table is a better design if rows in the two tables represent the same logical entity. If the user table has a 1:1 relationship with the user_detail table, you should (probably) combine them.

Edit: A few other answers have mentioned de-normalizing--this assumes the relationship between the tables is 1:n (I read your question to mean the relationship was 1:1). If the relationship is indeed 1:n, you absolutely want to keep them as two tables.

Upvotes: 4

J-16 SDiZ
J-16 SDiZ

Reputation: 26910

The current trend is denormalize (i.e. put them in the same table). It usually give better performance, but easier to make inconsistent (programming mistake, that is).

Plan: determine your workload type.

Benchmark: See if the performance gain worth the risk.

Upvotes: 1

Code Magician
Code Magician

Reputation: 23972

Joins themselves are not inherently bad; RDBMS are designed to perform joins very efficiently—even with millions or hundreds of millions of records. Normalize first before you start to de-normalize, especially if you're new to DB design. You may ultimately end up incurring more overhead maintaining a de-normalized database than you would to use the appropriate joins.

As to your specific question, it's very difficult to advise because we don't really know what's in the tables. I'll throw out some scenarios, and if one matches yours, then great, otherwise, please give us more details.

  1. If there is, and will always be a one-to-one relationship between user and user_details, then user details likely contains attributes of the same entity and you can consider combining them.

  2. If the relationship is 1-to-1 and the user_details contains LOTS of data for each user that you don't routinely need when querying, it may be faster to keep that in a separate table. I've seen this often as an optimization to reduce the cost of table scans.

  3. If the relationship is 1-to-many, I'd strongly advice against combining them, you'll soon wish you hadn't (as will those who come after you)

  4. If the schema of user_details changes, I've seen this too where there is a core table and an additional attribute table with variable schema. If this is the case, proceed with caution.

Upvotes: 2

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52107

Denormalization will generally use-up more space while affording better query performance.

Be careful though - cache also matters and having more data effectively "shrinks" your cache! This may or may not wipe-out the theoretical performance benefit of merging two tables into one. As always, benchmark with representative data.

Of course, the more denormalized your data model is, the harder it will be to enforce data consistency. Performance does not matter if data is incorrect!

So, the answer to your question is: "it depends" ;)

Upvotes: 1

Related Questions