GregP
GregP

Reputation: 153

How to properly index SQL Server table with 25 million rows

I have created a table in SQL Server 2008 R2 as follows:

CREATE TABLE [dbo].[7And11SidedDiceGame]
(
    [Dice11Sides] [INT] NULL,
    [Dice7Sides] [INT] NULL,
    [WhoWon] [INT] NULL
)

I added the following index:

CREATE NONCLUSTERED INDEX [idxWhoWon] 
ON [dbo].[7And11SidedDiceGame] ([WhoWon] ASC)

I then created a WHILE loop to Insert 25 million RANDomly generated rows to tally the results for statistical analysis.

Once I optimized the Insert function (Using BEGIN TRAN and COMMIT TRAN before and after the loop) the While loop ran decent. However, analyzing the data takes a long time. For example: using the following statement takes about 4 minutes to perform:

DECLARE @TotalRows real

SELECT @TotalRows = COUNT(*) 
FROM [test].[dbo].[7And11SidedDiceGame]

PRINT REPLACE(CONVERT(VARCHAR, CAST(@TotalRows AS money), 1),'.00','') 

SELECT 
    WhoWon, COUNT(WhoWon) AS Total,  
    ((COUNT(WhoWon) * 100) / @TotalRows) AS PercentWinner
FROM 
    [test].[dbo].[7And11SidedDiceGame]
GROUP BY 
    WhoWon

My question is how can I better index the table to speed up retrieval of the data? Or do I need to approach the pulling of the data in a different manner?

Upvotes: 2

Views: 1524

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

You may be able to speed this by using window functions:

SELECT WhoWon, count(*) AS Total,   
       count(*) * 100.0 / sum(count(*)) over ()  as PercentWinner
FROM [test].[dbo].[7And11SidedDiceGame]
GROUP BY WhoWon;

This does not provide the separate print statement.

For performance, try an index on (WhoWon).

Upvotes: 2

Vladimir Baranov
Vladimir Baranov

Reputation: 32695

I don't think you can do much here.

The query has to read all 25M rows from the index to count them. Though, 25M rows is not that much and I'd expect it to take less than 4 minutes on a modern hardware. It is only 100MB of data to read (OK, in practice it is more, say, 200MB, still it should not take 4 minutes to read 200MB off the disk).

Is the server under a heavy load? Are there a lot of inserts into this table?

You could make a small improvement by defining WhoWon column as NOT NULL in the table. Do you really have NULL values in it?

And then use COUNT(*) instead of count(WhoWon) in the query.

If this query runs often, but the data in the table doesn't change too often, you can create an indexed view that would essentially materialise/cache/pre-calculate these Counts, so the query that would run off such view would be much faster.

Upvotes: 4

Related Questions