Reputation: 564
This is a question that is very difficult to search for. I did try but I didn't find the kind of thing I am looking for.
I will try to explain this as best as I can.
I have a database setup and I am trying to add on some functionality. I have a concern about performance, data redundancy, and so-called best practices.
I have multiple users in the database and I am trying to set up "benchmarks" for each user. Each user can have a theoretically infinite number of benchmarks (though in reality it is almost always less than 5 but I don't want to constrain myself to that) and each benchmark can have a different set of values for each YEAR moving forward and management estimates the benchmarks for the upcoming 3-4 years. That means each year another benchmark value will exist.
Here is an example.
Person A
Benchmark 1: 2010, 2011, 2012, 2013, 2014, 2015
Benchmark 2: 2010, 2011, 2012, 2013, 2014, 2015
So on and so forth for each user and each benchmark.
My question is this:
Option 1: Set up a benchmark table that has a relationship to the user and columns for each year. I would have to actually update the database each year and add another column for the new year and it could grow out of hand over an extremely long period of time.
Option 2: Set up a table for the benchmark names with a relationship to the user, a table with years, and a table with benchmark data that has a relationship to the benchmark and the year? This seems like a better approach for maintenance sake but it would also mean the same benchmark name would exist multiple times but with a different year and value. I wouldn't have to add a column during a new year, just make sure the year existed in the years table.
Option 3: Something you guys tell me is a better option? I'm hoping.
Assume the data sets are large and there are many users.
Upvotes: 0
Views: 107
Reputation: 54312
Option 2 is probably fine. It will be plenty fast and much more maintainable. If you're concerned about repeating benchmark names, you could split that out too so you end up with:
benchmark_names(benchmark_id, name)
benchmarks(benchmark_id, user_id, year, value)
Note: The benchmark_names
table will presumably be very small (comparatively), so the extra join should be pretty much free, and it may even help performance with a large data set, since the benchmark ID will generally take up less memory than the name.
Upvotes: 3
Reputation: 51369
Why not just a column for user id, benchmark name, year, and value? Something like:
user_id | bm_name | bm_year | value
1234 | "SomeBM" | 2001 | 5
1234 | "SomeBM" | 2002 | 6
1234 | "SomeBM" | 2003 | 7
1234 | "SomeBM" | 2004 | 8
If you make a composite primary key of user id, benchmark name and year, you even get uniqueness guarantees. I don't see a whole lot of benefit from normalizing out a table that just stores the available years (or benchmark names), but that's up to you.
Upvotes: 1