Reputation: 3435
Sorry for my bad English.
Please assume that we want to save all commercial actions(sell & buy) for all days and each day has
100 or more action. After 3 years we have a table with more than 100,000 row. Now we want to get a
report from the actions done in 2 years ago between 5/20 to 5/25.
SELECT *
FROM actions
WHERE Date BETWEEN 2012/5/20 AND 2012/5/25
The problem is that all 100,000 row must be read for doing this.
Firstly I decided to separate the data for each month in separated tables. But I do not know how to
handle this with entity framework. Have you any suggestion please? Thanks.
Upvotes: 0
Views: 109
Reputation: 2466
Don't assume that your query will behave bad when there is X amount of rows in your tables. You should test it!
It is very easy to create a few million rows of test data for each of your tables (Should be done in development or test environment). Then you can test each of your queries and see exactly how "slow" they will be.
This snippet will create a table and insert 1 000 000 rows to it. Try it and try running a few different queries on it.
CREATE TABLE [dbo].[Orders](
[OrderId] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[CustomerId] [int] NOT NULL,
[ArticleId] [int] NOT NULL,
[TotalAmount] [decimal](19, 6) NULL,
[OrderDate] DATETIME NOT NULL DEFAULT(GETDATE())
);
WITH
C0(c) AS (SELECT 1 UNION ALL SELECT 1),
C1(c) AS (SELECT 1 FROM C0 AS A CROSS JOIN C0 AS B),
C2(c) AS (SELECT 1 FROM C1 AS A CROSS JOIN C1 AS B),
C3(c) AS (SELECT 1 FROM C2 AS A CROSS JOIN C2 AS B),
C4(c) AS (SELECT 1 FROM C3 AS A CROSS JOIN C3 AS B),
C5(c) AS (SELECT 1 FROM C4 AS A CROSS JOIN C4 AS B),
C6(c) AS (SELECT 1 FROM C5 AS A CROSS JOIN C5 AS B),
numbers(n) AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM C6)
INSERT dbo.Orders
( CustomerId ,
ArticleId ,
TotalAmount,
OrderDate
)
SELECT TOP 1000000
N % 150 + 1,
N % 100 + 1,
N % 500 + 20,
DATEADD(MINUTE, (N - 1), '2014-01-01')
FROM numbers;
The table will contain 1 000 000 orders, done by 150 different Customers, for 100 different Articles, for an amount between 20 and 520 each. Each order is placed with one minute in between each other starting from 2014-01-01 00:00:00.
Using that data, the following query still executed in under one second on my workstation:
SELECT * FROM dbo.Orders WHERE orderDate BETWEEN '2014-05-01' AND '2014-08-01'
Data have a tendency to be much smaller on disk than you think. This table with ONE MILLION rows in it still only take about 70MB of space.
EXEC sys.sp_spaceused @objname = N'Orders'
--name rows reserved data index_size unused
--Orders 1000000 70432 KB 37560 KB 32072 KB 800 KB
How long does it take to read this much of MB from disk? 2-3 seconds, worst case on a desktop.
Adding indexes: To comment on other answers.
I added an index on the date column, but the query optimizer still thought it was better to scan the entire table. This is probably because it is more expensive to perform lookups for all those orders in the date range than it is to read it all sequentially from disk. Depending on the data in the table, those indexes might or might not be used. This is why you should generate test data that matches your expected load, only then can you tune your queries and create the "correct" indexes.
For this particular table, and probably the one in the question, I would suggest putting the CLUSTERED index on the date column instead of the primary key.
Upvotes: 3
Reputation: 15099
First of all 100.000 rows won't be a problem for SQL Server, as long as you have the correct indices it will do fine even after 30 years.
If you still want to split it, i wouldn't do it manually, that will be too much of a mess, just use table partitioning, which is handled by sql server itself (http://technet.microsoft.com/en-us/library/ms190787.aspx for more info)
Upvotes: 1
Reputation: 62093
The problem is that all 100,000 row must be read for doing this.
A: EVEN IF - that would be trivial unless you run it on an old mobile phone. I regularly aggregate 100 million rows from a 10 billion row table.
B: Learn what an index is, then not all the row must be read.
Upvotes: 3