dcarneiro
dcarneiro

Reputation: 7170

Database Optimization - Store each day in a different column to reduce rows

I'm writing an application that stores different types of records by user and day. These records are divided in categories.

When designing the database, we create a table User and then for each record type we create a table RecordType and a table Record.

Example: To store data related to user events we have the following tables:

Event         EventType
-----         ---------
UserId        Id
EventTypeId   Name
Value
Day

Our boss pointed out (with some reason) that we're gonna store a lot of rows ( Users * Days ) and suggested an idea that seems a little crazy to me: Create a table with a column for each day of the year, like so:

EventTypeId | UserId | Year | 1 | 2 | 3 | 4 | ... | 365 | 366

This way we only have 1 row per user per year, but we're gonna get pretty big rows. Since most ORMs (we're going with rails3 for this project) use select * to get the database records, aren't we optimizing something to "deoptimize" another?

What's the community thoughs about this?

Upvotes: 1

Views: 258

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562368

This is a violation of First Normal Form. It's an example of repeating groups across columns.

Example of why this is bad: Write a query to find which day a given event occurred. You'll need a WHERE clause with 366 terms, separated by OR. This is tedious to write, and impossible to index.

Relational databases are designed to work well even if you have a lot of rows. Say you have 10000 users, and on average every user generates 10 events every day. After 10 years, you will have 10000*366*10*10 rows, or 366,000,000 rows. That's a fairly large database, but not uncommon.

If you design your indexes carefully to match the queries you run against this data, you should be able to have good performance for a long time. You should also have a strategy for partitioning or archiving old data.

Upvotes: 5

Sean U
Sean U

Reputation: 6850

I wouldn't do it. As long as you take the time to index the table appropriately, the database server should work well with tables that have lots of rows. If it's significantly slowing down your database performance, I'd start by making sure your queries aren't forcing a lot of full table scans.

Some other potential problems I see:

  • It probably will hurt ORM performance.
  • It's going to create maintainability problems on down the road. You probably don't want to be working with objects that have 366 fields for every day of the year, so there's probably going to have to be a lot of boilerplate glue code to keep track of.
  • Any query that wants to search against a range of dates is going to be an unholy mess.
  • It could be even more wasteful of space. These rows are big, and the number of rows you have to create for each customer is going to be the sum of the maximum number of times each different kind of event happened in a single day. Unless the rate at which all of these events happens is very constant and regular, those rows are likely to be mostly empty.

If anything, I'd suggest sharding the table based on some other column instead if you really do need to get the table size down. Perhaps by UserId or year?

Upvotes: 0

Massimiliano Peluso
Massimiliano Peluso

Reputation: 26737

That's breaks the DataBase normal forms principles

http://databases.about.com/od/specificproducts/a/normalization.htm

if it's applicable why don't you replace Day columns with a DateTime column in your event table with a default value (GetDate() we are talking about SQL)

then you could group by Date ...

Upvotes: 0

Related Questions