Farzin Kanzi
Farzin Kanzi

Reputation: 3435

Speed up finding records in a very large sql table

Sorry for my bad English.

Please assume that we want to save all commercial actions(sell & buy) for all days and each day has

100 or more action. After 3 years we have a table with more than 100,000 row. Now we want to get a

report from the actions done in 2 years ago between 5/20 to 5/25.

SELECT * 
  FROM actions 
 WHERE Date BETWEEN 2012/5/20 AND 2012/5/25

The problem is that all 100,000 row must be read for doing this.

Firstly I decided to separate the data for each month in separated tables. But I do not know how to

handle this with entity framework. Have you any suggestion please? Thanks.

Upvotes: 0

Views: 109

Answers (3)

Peter Henell
Peter Henell

Reputation: 2466

Don't assume that your query will behave bad when there is X amount of rows in your tables. You should test it!

It is very easy to create a few million rows of test data for each of your tables (Should be done in development or test environment). Then you can test each of your queries and see exactly how "slow" they will be.

This snippet will create a table and insert 1 000 000 rows to it. Try it and try running a few different queries on it.

CREATE TABLE [dbo].[Orders](
    [OrderId] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    [CustomerId] [int] NOT NULL,
    [ArticleId] [int] NOT NULL,
    [TotalAmount] [decimal](19, 6) NULL,
    [OrderDate] DATETIME NOT NULL DEFAULT(GETDATE())
);

WITH
            C0(c) AS (SELECT 1 UNION ALL SELECT 1),
            C1(c) AS (SELECT 1 FROM C0 AS A CROSS JOIN C0 AS B),
            C2(c) AS (SELECT 1 FROM C1 AS A CROSS JOIN C1 AS B),
            C3(c) AS (SELECT 1 FROM C2 AS A CROSS JOIN C2 AS B),
            C4(c) AS (SELECT 1 FROM C3 AS A CROSS JOIN C3 AS B),
            C5(c) AS (SELECT 1 FROM C4 AS A CROSS JOIN C4 AS B),
            C6(c) AS (SELECT 1 FROM C5 AS A CROSS JOIN C5 AS B),

        numbers(n) AS(
                        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM C6)

INSERT dbo.Orders
        ( CustomerId ,
          ArticleId ,
          TotalAmount,
          OrderDate
        ) 
SELECT TOP 1000000
        N % 150 + 1, 
        N % 100 + 1, 
        N % 500 + 20,
        DATEADD(MINUTE, (N - 1), '2014-01-01')
FROM numbers;

The table will contain 1 000 000 orders, done by 150 different Customers, for 100 different Articles, for an amount between 20 and 520 each. Each order is placed with one minute in between each other starting from 2014-01-01 00:00:00.

Using that data, the following query still executed in under one second on my workstation:

SELECT * FROM dbo.Orders WHERE orderDate BETWEEN '2014-05-01' AND '2014-08-01' 

Data have a tendency to be much smaller on disk than you think. This table with ONE MILLION rows in it still only take about 70MB of space.

EXEC sys.sp_spaceused @objname = N'Orders'
--name      rows        reserved    data        index_size  unused
--Orders    1000000     70432 KB    37560 KB    32072 KB    800 KB

How long does it take to read this much of MB from disk? 2-3 seconds, worst case on a desktop.

Adding indexes: To comment on other answers.

I added an index on the date column, but the query optimizer still thought it was better to scan the entire table. This is probably because it is more expensive to perform lookups for all those orders in the date range than it is to read it all sequentially from disk. Depending on the data in the table, those indexes might or might not be used. This is why you should generate test data that matches your expected load, only then can you tune your queries and create the "correct" indexes.

For this particular table, and probably the one in the question, I would suggest putting the CLUSTERED index on the date column instead of the primary key.

Upvotes: 3

peter
peter

Reputation: 15099

First of all 100.000 rows won't be a problem for SQL Server, as long as you have the correct indices it will do fine even after 30 years.

If you still want to split it, i wouldn't do it manually, that will be too much of a mess, just use table partitioning, which is handled by sql server itself (http://technet.microsoft.com/en-us/library/ms190787.aspx for more info)

Upvotes: 1

TomTom
TomTom

Reputation: 62093

The problem is that all 100,000 row must be read for doing this.

A: EVEN IF - that would be trivial unless you run it on an old mobile phone. I regularly aggregate 100 million rows from a 10 billion row table.

B: Learn what an index is, then not all the row must be read.

Upvotes: 3

Related Questions