federicot
federicot

Reputation: 12341

Simpler queries vs. simpler database

In my website I have a forum in which you can start a thread, respond to one, delete your posts, report others' posts, etc.

I decided to make an abstraction and to create a table called activity which would store the user_id, time (the fields every activity has in common), and the type of activity (thread, response, report, deletion) and the effective_id (the id that would correspond to respective table).

At the beginning this seemed like a great idea, because it would help to avoid redundancy and would make it easier to look up what a certain user has done without having to ask each table. But at the same time, this abstraction results in more complicated queries (e.g. having to use INNER JOIN in almost every simple query) and now I'm having trouble with more complex queries.

So my question is: did I take the right decision? Is in a real-life environment so important to have a perfect database even if it results in very expensive queries?

Upvotes: 0

Views: 75

Answers (4)

Walter Mitty
Walter Mitty

Reputation: 18940

I am not sure from your question what your purpose was in splitting off the common features of all activities, nor in what manner complex queries are giving you trouble.

The description of types of activities (thread, response, report, deletion) certainly look like a classic case of types and subtypes otherwise known as classes and subclasses. There are two classic design patterns for this case, known as Single Table Inheritance and Class Table Inheritance. There are tags for these two design patterns. There is another design pattern, Shared Primary Key that can be used to good effect in conjunction with Class Table Inheritance.

The shift from STI to CTI does involve some table decomposition, and it's similar to the decomposition you describe. The use of SPK obviates the need for separate id fields for each specialized subclass table, and also obviates the need for a separate type field. This might result in simpler queries than the ones you ended up struggling with. Without seeing those queries, it's impossible to know.

It's worth noting that this decomposition is not "normalization" in the sense of gaining conformance to any of the standard normal forms 2NF through 5NF. But those normal forms are concerned with simplicity of update, not simplicity of queries anyway.

A good design is made in consideration of the ways the data is going to be used. Sometimes we learn that along the way. I have no idea what a perfect design looks like. I think we're always involved with trade-offs.

Upvotes: 1

JonH
JonH

Reputation: 33153

The whole point of relational database system is to relate data (by separating entities in the correct tables), by doing so you introduce joins. This is normal and a good database design technique.

A lot of database beginners try to avoid joins and add attributes from various entities to avoid having to do a INNER join, but this is not a proper technique and will bite you in the long run. Joins are there for a reason and should be used when you need to relate data.

In your example you are in essence creating a "log" file. The activity on your application is simply a log to what the user is doing. Ask yourself, what would be better a log for a post, a log for a report, a log for deletions, a log for, in addition many log tables or one simple log table with a referenceID as to what activity the user initiated (this in fact is a FK). The answer is you'd want one log table that contains a foreign key have you of what the user initiated (a delete, a add, a flag, etc).

The question you want to ask yourself though is, why are you displaying the activity information when a user simply wants to get this data. You can always keep this data normalized as you have but only select it when you need it. I don't understand why you are joining to this activity table.

Upvotes: 1

David Hoerster
David Hoerster

Reputation: 28701

I think I would have gone in the same direction that you did 3-5 years ago, but now I would favor a simpler, flatter database design for an activity that you describe instead of going with a more normalized approach.

My use and understanding of CQRS and Task Driven UI really helped me realize this -- although it may not pertain to your situation.

Essentially, I used to optimize my databases so that inserts, updates and deletes were very efficient, but this would result in joining many tables for simple selects. The problem is that 80% of the time, the user wants to select. So optimizing my database structure for what the user does the majority of the time really helps with the overall performance of the application, and in my opinion, the maintenance and extensibility of the system.

I'm making some assumptions about your application and what you're trying to do, but an activity table, to me, sounds like something that could be fed from a queue and built up by a separate worker process/thread that's monitoring the queue for work items. As those work items (commands??) are found, the worker process would flatten out the data and update the appropriate activity table(s). When you went to query your activity table, you'd basically be doing a simple SELECT * FROM ACTIVITY (although I don't recommend SELECT * queries -- name your columns). So you take the hit on inserting/updating your activity table, but your select performance is great.

I hope this helps.

Upvotes: 1

w0051977
w0051977

Reputation: 15807

Have you thought about denormalizing the database? Have a look here for more information: http://en.wikipedia.org/wiki/Denormalization.

Upvotes: 0

Related Questions