john
john

Reputation: 23

Will normalising my database kill the scalability?

I have a database that will form part of a highly trafficked web app.

I'm wondering if I should normalise the tables so things such as (e.g.) 'question_type' should be in a separate table too all the basic information about the question such as 'title' and 'question_body'?

I'm only asking because I need this database to be as scalable as possible and I'm told normalisation isn't always the way to go when you need scalability.

Thanks

Upvotes: 1

Views: 381

Answers (4)

unpythonic
unpythonic

Reputation: 4070

If your table has a question_body and question_type, then I don't see how moving it to another table achieves normalization. e.g.:

table question (
    question_body      text,
    question_user      text,
    question_user_rank integer,
    question_type      text
);

Splitting out a single value into a single column table won't achieve anything other than useless joins. That is:

select * from question q join question_type qt on (q.qt_id = qt.id)
  where qt.name = 'sql questions';

is an equivalent, but wasteful form of

select * from question
  where question_type = 'sql questions';

On the other hand, (using the example above), it makes a lot of sense to split out the question user information into its own table:

table question (
   question_body     text,
   question_type     text,
   question_user_id  integer references question_user(id) on delete cascade
);
table question_user (
   id                integer,
   name              text,
   rank              integer
);

So if a user has his rank changed (ala SO), you only have to change it in one place rather than in every row where he's asked a question. You've increased your ability to handling scaling since you've changed hundreds of updates into a single update.

Upvotes: 1

Data Monk
Data Monk

Reputation: 1299

Now that's a loaded question. Normalization isn't a hard rule so much as a guideline. Designing a database is made up of a series of decisions regarding the level of normalization that makes sense given your need for code efficiency, performance and integrity, among other things. That's greatly oversimplifying it, but the spectrum of design decisions spans volumes of well-authored books.

Can you tell me a little bit more about your application and intended platform? I might be able to steer you in the direction of some very useful reference material if I can better understand your situation.

Upvotes: 0

TomTom
TomTom

Reputation: 62091

Will adding salt make my food taste better?

Same question. Noone can answer.

The main proble mis that it depends on your USAGE patterns and to soem degree your competence as programmer, to use lookup caches in the application instead of database joins. Quite a lot of programmers never get above the "scrambled eggs, burned" level of SQL, to keep a cooking analogy.

For scalability application design AND database technology have a lot more to say. Hard to beat an Oracle RAC installation. Depending on what you need on an Exadata platform. Cost is I think around half a million USD for the smallest unit. Still sure you need "as scalable as possible"? Not joking here - I right now work on a 6000 gb data warehouse, we just ordered 3 of those monsters, and not the smallest one.

So, what do you mean with "as scalable as possible"? THis is like "my car needs to go as fast as a car ever has gone and more", then you end up with a special made car with a jet engine in it ;)

General rule: * Separate transactions and reporting into two databases. The second being a data warehouse. * Normalize transactional db * Use star schema on data warehouse.

BIG chance is: you dont kno what you talk ab out, never did scalability, so there is a 80% chance your "high scalability" requirement is a joke for a decent database server. Now, that is not meant insulting, but i have seen SO many people say "I have a ton of data in a table" which turns ou to be 10.000 rows maximum. That is not a ton - it is a joke. We load 100 million daily into our data warehouse main table (and have to keep them many years). Most peope dont really get the speed a decent database server can provide. Which means many discs.

Upvotes: -1

ekaqu
ekaqu

Reputation: 2058

The thing that makes normalization an issue with scaling is that it tends to need to have multiple tables join together. Joins are great on small tables but the larger the table grows the harder the server needs to work.

The main thing to look to is avoiding joins. If you can do the query without a join by adding a field to one of the tables, you just speed up the performance of that query.

Upvotes: 1

Related Questions