Reputation: 7170
I'm writing an application that stores different types of records by user and day. These records are divided in categories.
When designing the database, we create a table User
and then for each record type we create a table RecordType
and a table Record
.
Example: To store data related to user events we have the following tables:
Event EventType
----- ---------
UserId Id
EventTypeId Name
Value
Day
Our boss pointed out (with some reason) that we're gonna store a lot of rows ( Users * Days ) and suggested an idea that seems a little crazy to me: Create a table with a column for each day of the year, like so:
EventTypeId | UserId | Year | 1 | 2 | 3 | 4 | ... | 365 | 366
This way we only have 1 row per user per year, but we're gonna get pretty big rows. Since most ORMs (we're going with rails3 for this project) use select * to get the database records, aren't we optimizing something to "deoptimize" another?
What's the community thoughs about this?
Upvotes: 1
Views: 258
Reputation: 562368
This is a violation of First Normal Form. It's an example of repeating groups across columns.
Example of why this is bad: Write a query to find which day a given event occurred. You'll need a WHERE clause with 366 terms, separated by OR
. This is tedious to write, and impossible to index.
Relational databases are designed to work well even if you have a lot of rows. Say you have 10000 users, and on average every user generates 10 events every day. After 10 years, you will have 10000*366*10*10 rows, or 366,000,000 rows. That's a fairly large database, but not uncommon.
If you design your indexes carefully to match the queries you run against this data, you should be able to have good performance for a long time. You should also have a strategy for partitioning or archiving old data.
Upvotes: 5
Reputation: 6850
I wouldn't do it. As long as you take the time to index the table appropriately, the database server should work well with tables that have lots of rows. If it's significantly slowing down your database performance, I'd start by making sure your queries aren't forcing a lot of full table scans.
Some other potential problems I see:
If anything, I'd suggest sharding the table based on some other column instead if you really do need to get the table size down. Perhaps by UserId or year?
Upvotes: 0
Reputation: 26737
That's breaks the DataBase normal forms principles
http://databases.about.com/od/specificproducts/a/normalization.htm
if it's applicable why don't you replace Day columns with a DateTime column in your event table with a default value (GetDate()
we are talking about SQL)
then you could group by Date ...
Upvotes: 0