Reputation: 301
I have a table with the following definition:
CREATE TABLE [dbo].[Transactions]
(
[ID] [varchar](18) NOT NULL,
[TIME_STAMP] [datetime] NOT NULL,
[AMT] [decimal](18, 4) NOT NULL,
[CID] [varchar](90) NOT NULL,
[DEPARTMENT] [varchar](4) NULL,
[SOURCE] [varchar](14) NULL,
PRIMARY KEY NONCLUSTERED
(
[ID] ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The table has 75 million rows in it. Somehow, it takes up 20 GB of disk space!
The following 2 queries...
SELECT
SUM(AMT)
FROM
Transactions
WHERE
TIME_STAMP >= '2017-11-11 00:00:00' AND
TIME_STAMP < '2017-11-12 00:00:00'
SELECT
COUNT(DISTINCT(CID))
FROM
Transactions
WHERE
TIME_STAMP >= '2017-11-11 00:00:00' AND
TIME_STAMP < '2017-11-12 00:00:00'
...each take about 2 minutes to run!
BTW, the table has a "Clustered ColumnStore Index" which I can't delete, as last time I tried to delete it, the DROP failed due to insufficient disk space. The drive on which the data is stored has 28 GB free of 50 GB after shrinking DBs.
Please advise on what I need to do, which indexes I need to create with which settings to optimize performance of these queries and WHY so that I and others can understand and learn.
Also, what if I also need to specify DEPARTMENT and SOURCE in some WHERE clauses? These columns have only a few distinct values.
Thank you!
Upvotes: 0
Views: 80
Reputation: 31775
Creating an index on Time_Stamp
that INCLUDES AMT
and CID
should optimize the queries in your question.
"Why" is a broad subject that is not well suited to a stack-overflow answer. Read tutorials and blogs on SQL Server indexing.
EDIT response to comment:
Columns that you filter on in the WHERE clause should be part of the index itself (even if they only have a few distinct values), columns that you just return in the SELECT list should be part of the INCLUDES clause of the index.
Upvotes: 0
Reputation: 5458
You need a clustered index on your table (which typically is the primary key which can be composed of one or more columns that combine to a unique value. A clustered index does not have to be unique but a primary key does. I would guess the id is a good candidate for the primary key. I would then create one index based on the time_stamp and include CID and Amt.
You want your query to use seeks (which means that SQL Server only reads relevant rows) instead of scans (in which SQL Server reads the entire table).
Again we don't see the data but this is what I would suggest:
alter table [dbo].[Transactions] add constraint PK_transactions_id primary key ( id);
create index ix_transactions_time_stamp_amt on transactions(timestamp) include (cid,amt);
Upvotes: 0
Reputation: 1269573
I think one index will suffice:
create index idx_transactions_timestamp_cid_amt on transactions(timestamp, cid, amt);
This covers the first query. The cid
in the index is irrelevant to the use of the index for the query. The index will be scanned based on the date. You could opt to "include" amt
rather than including it as a separate key.
Upvotes: 1
Reputation: 2044
General rule of thumb is to index the columns in your where clause and any join predicates BUT that's a rough guide & don't just wildly apply the index hints from SSMS, they often lie are they will be better ones
If you put an index on TIME_STAMP and include the column in the select as either an included column or as a Key then they should have a positive effect for both queries.
e.g.
CREATE INDEX IX_TEST1 on Transactions (TIME_STAMP) INCLUDE (AMT)
CREATE INDEX IX_TEST1 on Transactions (TIME_STAMP) INCLUDE (CID)
The downside is that these are date and varchar fields so the indexes are going to get pretty big
Upvotes: 0