Reputation: 17293
I am planning a web application (programmed using ASP.NET) that manages the database of logged events. The database will be managed in an SQL Server 2008. Each event may come from a set of, let's call them, "units." A user will be able to add and remove these "units" via the ASP.NET interface.
Each of the "units" can potentially log up to a million entries, or maybe even more. (The cut off will be administered via a date. For instance:
DELETE FROM [tbl] WHERE [date] < '01-01-2011'
The question I have is what is the best way to structure such database:
By placing all entries for all "units" in a single table like this:
CREATE TABLE tblLogCommon (id INT PRIMARY INDEX,
idUnit INT,
dtIn DATETIME2, dtOut DATETIME2, etc INT)
Or, by separating tables for each "unit":
CREATE TABLE tblLogUnit_1 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
CREATE TABLE tblLogUnit_2 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
CREATE TABLE tblLogUnit_3 (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
--and so on
CREATE TABLE tblLogUnit_N (id INT PRIMARY INDEX, dtIn DATETIME2, dtOut DATETIME2, etc INT)
Approach #1 seems simpler from a standpoint of referencing entries because with approach #2 I'll have to deal with variable N number of tables (as I said users will be allowed to add and remove "units.)
But approach #1 may render access to those log entries later very inefficient. I will have to generate reports from those logs via the ASP.NET interface.
So I'd like to hear your take on this before I begin coding?
EDIT: I didn't realize that the number of columns in a table makes a difference. My bad! The actual number of columns in a table is 16.
Upvotes: 1
Views: 221
Reputation: 17293
I did some tests on the actual data with SQL Server 2008 Express, using local computer connection, no network latency. The computer this was tested on: Desktop, Windows 7 Ultimate, 64-bit, CPU: i7, @2.8GHZ, 4 cores; RAM: 8GB; HDD (OS): 1TB, 260GB free.
First all records were located in a "SINGLE" table (approach #1). All records were generated with random data. A complex SELECT statement processing each particular "unitID" was tried two times (one immediately after another), with CPU load: 12% to 16%, RAM load: 53% - 62%. Here's the outcome:
UnitID NumRecords Complex_SELECT_Timing
1 486,810 1m:26s / 1m:13s
3 1,538,800 1m:13s / 0m:51s
4 497,860 0m:30s / 0m:24s
5 497,860 1m:20s / 0m:50s
Then the same records were separated into four tables with identical structure (approach #2). I then ran the same SELECT statement two times as before, on the same PC, with identical CPU and RAM loads. Next are the results:
Table NumRecords Complex_SELECT_Timing
t1 486,810 0m:19s / 0m:12s
t3 1,538,800 0m:42s / 0m:38s
t4 497,860 0m:03s / 0m:01s
t5 497,860 0m:15s / 0m:12s
I thought to share this with whoever is interested. This pretty much gives your the answer...
Thanks everyone who contributed!
Upvotes: 0
Reputation: 45096
Splitting in separate tables is going to yield better insert and search speed.
With one table the difference is an index on idUnit. With that index search speed is going to be nearly as fast as separate tables (and you can search across idUnits is a single query). Where one table is going to take a hit is insert but that is a small hit.
Upvotes: 1
Reputation: 51494
A lot depends on how you intend to use this data. If you split the data into multiple tables, will you be querying over multiple tables, or will all your queries be within the defined date range. How often will data be inserted and updated.
In other words, there's no correct answer!
Also, can you afford a license for SQL enterprise in order to use partitioned tables?
Upvotes: 0
Reputation: 166336
I would go with approach 1, as the table does not seem very large(width wise) and yuo could apply indexes to improve searching/selecting.
Further to this, you could also look at partitioned tables and indexes.
Creating Partitioned Tables and Indexes
Upvotes: 3