Reputation: 9682

Simple SQL schema design? When do we implement redundant relationships?

I often find myself in a situation where I don't know how redundant my SQL relationships should be. Here's a simple example:

In both cases, all data is accessible via joins. If I want to query the school in the first example (if I only know the student_id), I would need two joins. In the second example, I'd need one less join but the relationships are more complicated.

I'm not sure if both approaches are valid or if one is an antipattern in SQL?

(of course, this is a simplified example)

Upvotes: 0

Answers (3)

Neville Kuyt

Reputation: 29629

It's worth thinking through the implications of both designs.

I assume this is a hypothetical scenario - in practice, you're likely to have many-to-many relationships between school and class, and between class and student.

The first option you present is a normalized design. This means you're making explicit statements about the problem domain:

A school has 0 or more classes
A class belongs to exactly 1 school
A class has 0 or more students
A student belongs to exactly on class

The information about the relationships between your entities is contained in a single place. This has several benefits - you only have to update it in one place if a student changes class, or a class changes school. This is much less likely to cause bugs in future. The schema is easy to understand for new developers who are used to a normalized database model - you don't have to worry about bugs, or remember to update two different tables when relationships change.

The "drawback" is that you need joins to answer queries like "find all students for a school". I've put quotes around drawback, because joins are inherent to the relational model - it's like saying feet are a drawback of having legs. As long as you have indexes on the identifying columns, there's usually no noticable performance impact.

Option 2 is de-normalized by duplicating the information about the relationship between class and school in the student table. De-normalization is not quite an anti-pattern, but it brings risks. The duplication of information means you have to maintain it in two places, and handle bugs where that information gets out of sync. For non-trivial systems, this can be a lot of work.

Denormalization is used when there are performance challenges with the normalized model. On modern hardware, that's typically in databases with hundreds of millions of records or more.

Upvotes: 2

fatalcoder524

Reputation: 1480

I would follow the second way as it is more realistic to a real-world situation.

Eg:

Considering a university like VTU which has 4 regions and different colleges in each region, the student is identified with a USN in the format region_code + college_code + year + branch_code + student_code .

ex: 4 XX 10 YY ZZZ

I could easily query something like:-

All students of a region.
All students of a college.
All students of a region of some branch. etc...

Upvotes: 1

rlm96

Reputation: 193

Check what is normalization. The second schema is often used in star schema in datawarehouses. This is because the difference implicit in oltp vs olap systems. In oltp relations using joins are required, but if it's an "only read" (olap) database joins don't carry a good performance.

Upvotes: 1

Simple SQL schema design? When do we implement redundant relationships?

Answers (3)

Related Questions