Reputation: 301

What indexes should I create on this table in SQL Server?

I have a table with the following definition:

CREATE TABLE [dbo].[Transactions]
(
    [ID] [varchar](18) NOT NULL,
    [TIME_STAMP] [datetime] NOT NULL,
    [AMT] [decimal](18, 4) NOT NULL,
    [CID] [varchar](90) NOT NULL,
    [DEPARTMENT] [varchar](4) NULL,
    [SOURCE] [varchar](14) NULL,
    PRIMARY KEY NONCLUSTERED 
    (
        [ID] ASC
    )
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

The table has 75 million rows in it. Somehow, it takes up 20 GB of disk space!

The following 2 queries...

SELECT 
    SUM(AMT) 
FROM 
    Transactions
WHERE 
    TIME_STAMP >= '2017-11-11 00:00:00' AND 
    TIME_STAMP < '2017-11-12 00:00:00'


SELECT 
    COUNT(DISTINCT(CID))
FROM 
    Transactions
WHERE 
    TIME_STAMP >= '2017-11-11 00:00:00' AND 
    TIME_STAMP < '2017-11-12 00:00:00'

...each take about 2 minutes to run!

BTW, the table has a "Clustered ColumnStore Index" which I can't delete, as last time I tried to delete it, the DROP failed due to insufficient disk space. The drive on which the data is stored has 28 GB free of 50 GB after shrinking DBs.

Please advise on what I need to do, which indexes I need to create with which settings to optimize performance of these queries and WHY so that I and others can understand and learn.

Also, what if I also need to specify DEPARTMENT and SOURCE in some WHERE clauses? These columns have only a few distinct values.

Thank you!

Upvotes: 0

Answers (4)

Tab Alleman

Reputation: 31785

Creating an index on Time_Stamp that INCLUDES AMT and CID should optimize the queries in your question.

"Why" is a broad subject that is not well suited to a stack-overflow answer. Read tutorials and blogs on SQL Server indexing.

EDIT response to comment:

Columns that you filter on in the WHERE clause should be part of the index itself (even if they only have a few distinct values), columns that you just return in the SELECT list should be part of the INCLUDES clause of the index.

Upvotes: 0

benjamin moskovits

Reputation: 5468

You need a clustered index on your table (which typically is the primary key which can be composed of one or more columns that combine to a unique value. A clustered index does not have to be unique but a primary key does. I would guess the id is a good candidate for the primary key. I would then create one index based on the time_stamp and include CID and Amt.

You want your query to use seeks (which means that SQL Server only reads relevant rows) instead of scans (in which SQL Server reads the entire table).

Again we don't see the data but this is what I would suggest:

alter table [dbo].[Transactions] add constraint PK_transactions_id primary key ( id);
create index ix_transactions_time_stamp_amt on transactions(timestamp) include (cid,amt);

Upvotes: 0

Gordon Linoff

Reputation: 1271031

I think one index will suffice:

create index idx_transactions_timestamp_cid_amt on transactions(timestamp, cid, amt);

This covers the first query. The cid in the index is irrelevant to the use of the index for the query. The index will be scanned based on the date. You could opt to "include" amt rather than including it as a separate key.

Upvotes: 1

dbajtr

Reputation: 2044

General rule of thumb is to index the columns in your where clause and any join predicates BUT that's a rough guide & don't just wildly apply the index hints from SSMS, they often lie are they will be better ones

If you put an index on TIME_STAMP and include the column in the select as either an included column or as a Key then they should have a positive effect for both queries.

e.g.

CREATE INDEX IX_TEST1 on Transactions (TIME_STAMP) INCLUDE (AMT)
CREATE INDEX IX_TEST1 on Transactions (TIME_STAMP) INCLUDE (CID)

The downside is that these are date and varchar fields so the indexes are going to get pretty big

Upvotes: 0

What indexes should I create on this table in SQL Server?

Answers (4)

Related Questions