I have a routine that will be creating individual tables (Sql Server 2008) to store the results of reports generated by my application (Asp.net 3.5). Each report will need its own table, as the columns for the table would vary based on the report settings. A table will contain somewhere between 10-5,000 rows, rarely more than 10,000. The following usage rules will apply: Once stored, the data will never be updated. Whenever results for the table are accessed, all data will be retrieved. No other table will need to perform a join with this table. Knowing this, is there any reason to create a PK index column on the table? Will doing so aid the performance of retrieving the data in any way, and if it would, would this outweigh the extra load of updating the index when inserting data (I know that 10K records is a relatively small amount, but this solution needs to be able to scale). Update : Here are some more details on the data being processed, which goes into the current design decision of one table per report: Tables will record a set of numeric values (set at runtime based on the report settings) that correspond to a different set of reference varchar values (also set at runtime based on the report settings). Whenever data is retrieved, it some post-processing on the server will be required before the output can be displayed to the user (thus I will always be retrieving all values). I would also be suspicious of someone claiming that they had to create a new table for each time the report was run. However, given that different columns (both in number, name and datatype) could conceivably be needed for every time the report was run, I don't see a great alternative. The only other thing I can think of is to have an ID column (identifying the ReportVersionID, corresponding to another table), ReferenceValues column (varchar field, containing all Reference values, in a specified order, separated by some delimiter) and NumericValues column (same as ReferenceValues, but for the numbers), and then when I retrieve the results, put everything into specialized objects in the system, separating the values based on the defined delimiter). Does this seem preferable?

sql-serverdatabasedatabase-designindexingprimary-key

Reputation: 41500

Is there a reason for a Primary Key column that would never be used?

I have a routine that will be creating individual tables (Sql Server 2008) to store the results of reports generated by my application (Asp.net 3.5). Each report will need its own table, as the columns for the table would vary based on the report settings. A table will contain somewhere between 10-5,000 rows, rarely more than 10,000.

The following usage rules will apply:

Once stored, the data will never be updated.
Whenever results for the table are accessed, all data will be retrieved.
No other table will need to perform a join with this table.

Knowing this, is there any reason to create a PK index column on the table? Will doing so aid the performance of retrieving the data in any way, and if it would, would this outweigh the extra load of updating the index when inserting data (I know that 10K records is a relatively small amount, but this solution needs to be able to scale).

Update: Here are some more details on the data being processed, which goes into the current design decision of one table per report:

Tables will record a set of numeric values (set at runtime based on the report settings) that correspond to a different set of reference varchar values (also set at runtime based on the report settings).
Whenever data is retrieved, it some post-processing on the server will be required before the output can be displayed to the user (thus I will always be retrieving all values).

I would also be suspicious of someone claiming that they had to create a new table for each time the report was run. However, given that different columns (both in number, name and datatype) could conceivably be needed for every time the report was run, I don't see a great alternative.

The only other thing I can think of is to have an ID column (identifying the ReportVersionID, corresponding to another table), ReferenceValues column (varchar field, containing all Reference values, in a specified order, separated by some delimiter) and NumericValues column (same as ReferenceValues, but for the numbers), and then when I retrieve the results, put everything into specialized objects in the system, separating the values based on the defined delimiter). Does this seem preferable?

Upvotes: 5

Answers (6)

Unreason

Reputation: 12704

One of the most useful and least emphasized (explicitly) benefits of data integrity (primary keys and foreign key references to start with) is that it forces a 'design by contract' between your data and your application(s); which stops quite a lot of types of bugs from doing any damage to your data. This is such a huge win and a thing that is implicitly taken for granted (it is not 'the database' that protects it, but the integrity rules you specify; forsaking the rules you expose your data to various levels of degradation).

This seems unimportant to you (from the fact that you did not even discuss what would be a possible primary key) and your data seems quite unrelated to other parts of the system (from the fact that you will not do joins to any other tables); but still - if all things are equal I would model the data properly and then if primary keys (or other data integrity rules) are not used and if chasing every last bit of performance I would consider dropping them in production (and test for any actual gains).

As for comments that creating tables is a performance hit - that is true, but you did not tell us how temporary are these tables? Once created will they be heavily used before scrapped? Or do you plan to create tables for just dozen of read operations.

In case you will heavily use these tables and if you will provide clean mechanism for managing them (removing them when not used, selecting them, etc...) I think that dynamically creating the tables would be perfectly fine (you could have shared more details on the tables themselves; use case would be nice)

Notes on other solutions:

EAV model

is horrible unless very specific conditions are met (for example: flexibility is paramount and automating DDL is too much of a hassle). Keep away from it (or be very, very good at anticipating what kinds of queries will you have to deal with and rigorous in validating data on the front end).

XML/BLOB approach

might be the right thing for you if you will consume the data as XML/BLOBs at presentation layer (always read all of the rows, always write the whole 'object' and finally, if your presentation layer likes XML/BLOBS)

EDIT: Also, depending on the usage patterns, having primary key can indeed increase the speed of retrieval, and if I can read the fact that the data will not be updated as 'it will be written once and read many times' then there is a good chance that it will indeed overweight the cost of updating the index on inserts.

Upvotes: 1

Brad

Reputation: 15577

Primary keys are not a MUST for any and all data tables. True, they are usually quite useful and to abandon them is unwise. However, in addition to a primary missions of speed (which I agree would doubtfully be positively affected) is also that of uniqueness. To that end, and valuing the consideration you've already obviously taken, I would suggest that the only need for a primary key would be to govern the expected uniqueness of the table.

Update: You mentioned in a comment that if you did a PK that it would include an Identity column that presently does not exist and is not needed. In this case, I would advise against the PK altogether. As @RedFilter pointed out, surrogate keys never add any value.

Upvotes: 3

A-K

Reputation: 17080

I would keep it simple, just store the report results converted to json or xml, in a VARCHAR(MAX) column

Upvotes: 1

Philip Kelley

Reputation: 40319

What column or columns would the PK index be built on? If just a surrogate identity column, you'll have no performance hit when inserting rows, as they'd be inserted "in order". If it is not a surrogate key, then you have the admittedly minor but still useful assurance that you don't have duplicate entries.

Is the primary key used to control the order in which report rows are to be printed? If not, then how do you ensure proper ordering of the information? (Or is this just a data table that gets summed one way and another whenever a report is generated?)

If you use a clustered primary key, you wouldn't use as much storage space as you would with a non-clustered index.

By and large, I find that while not every table requires a primary key, it does not hurt to have one present, and since proper relational database design requires primary keys on all tables, it's good practice to always include them.

Upvotes: 0

Guffa

Reputation: 700362

If you are really not using the tables for anything other than as a chunk of read-only data, you could just as well store all the reports in a single table, as XML values.

Upvotes: 0

Derick Bailey

Reputation: 72868

will it 1 table for every run of a given report, or one table to all runs of a given report? in other words, if you have Report #1 and you run it 5 times, over a different range of data, will you produce 5 tables, or will all 5 runs of the report be stored in the same table?

If you are storing all 5 runs of the report in the same table, then you'll need to filter the data so that it is appropriate to the run in question. in this case, having a primary key will let you do the where statement for the filter, much faster.

if you are creating a new table for every run of the report, then you don't need a primary key. however, you are going to run into to other performance problems as the number of tables in your system grows... assuming you don't have something in place to drop old data / tables.

Upvotes: 0

Is there a reason for a Primary Key column that would never be used?

Answers (6)

Related Questions