Reputation: 9299
I'm trying to write a query in SQL Server, but it's doing a table scan on a table with about 30 million rows (TGS_INFO), so the query runs very slowly.
The actual query is more complex but I've reduced it to a simpler version that still exhibits the same issue.
SELECT DISTINCT UNIT_ITEMS.DBKEY,
UNIT_ITEMS.ID,
UNIT_ITEMS.LOCATION1,
UNIT_ITEMS.LOCATION2
FROM UNIT_ITEMS
INNER JOIN TGS.dbo.TGS_INFO
ON UNIT_ITEMS.UNIT_ID = TGS_INFO.UNIT_ID AND
UNIT_ITEMS.ITEM_ID = TGS_INFO.ITEM_ID AND
UNIT_ITEMS.LOCATION1 = TGS_INFO.LOCATION1 AND
UNIT_ITEMS.LOCATION2 = TGS_INFO.LOCATION2
Here is the execution plan.
StmtText
|--Sort(DISTINCT ORDER BY:([DbName].[dbo].[UNIT_ITEMS].[DBKEY] ASC, [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID] ASC, [DbName].[dbo].[UNIT_ITEMS].[LOCATION1] ASC, [DbName].[dbo].[UNIT_ITEMS].[LOCATION2] ASC))
|--Hash Match(Inner Join, HASH:([DbName].[dbo].[UNIT_ITEMS].[UNIT_ID], [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID], [DbName].[dbo].[UNIT_ITEMS].[LOCATION1], [DbName].[dbo].[UNIT_ITEMS].[LOCATION2])=([Expr1008], [Expr1009], [Expr1010], [Expr1011]), RESIDUAL:([DbName].[dbo].[UNIT_ITEMS].[UNIT_ID]=[Expr1008] AND [DbName].[dbo].[UNIT_ITEMS].[ITEM_ID]=[Expr1009] AND [DbName].[dbo].[UNIT_ITEMS].[LOCATION1]=[Expr1010] AND [DbName].[dbo].[UNIT_ITEMS].[LOCATION2]=[Expr1011]))
|--Table Scan(OBJECT:([DbName].[dbo].[UNIT_ITEMS]))
|--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[UNIT_ID],0), [Expr1009]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[ITEM_ID],0), [Expr1010]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[LOCATION1],0), [Expr1011]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[LOCATION2],0)))
|--Table Scan(OBJECT:([TGS].[dbo].[TGS_INFO]))
TGS_INFO and UNIT_ITEMS both have nonclustered indexes on UNIT_ID and ITEM_ID. As mentioned, TGS_INFO has about 30 million rows but they are evenly distributed around about a thousand different UNIT_IDs. UNIT_ITEMS always contains only one UNIT_ID.
Here are the indexes:
CREATE NONCLUSTERED INDEX [IX_UNIT_ID_ITEM_ID] ON [dbo].[TGS_INFO]
(
[UNIT_ID] ASC,
[ITEM_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_UNIT_ID_ITEM_ID] ON [dbo].[UNIT_ITEMS]
(
[UNIT_ID] ASC,
[ITEM_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
As I mentioned in the comments, all the columns are VARCHAR(50) in TGS_INFO. All the columns in UNIT_ITEMS are ints.
For the record, I didn't design the schema of TGS_INFO.
Upvotes: 1
Views: 823
Reputation: 32145
I notice the execution plan shows the following:
|--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[UNIT_ID],0), [Expr1009]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[ITEM_ID],0), [Expr1010]=CONVERT_IMPLICIT(nvarchar(50),[TGS].[dbo].[TGS_INFO].[LOCATION1],0), [Expr1011]=CONVERT_IMPLICIT(int,[TGS].[dbo].[TGS_INFO].[LOCATION2],0)))
I can't think of a good reason for the query engine to do an implicit data type conversion on these columns unless the data types between the two tables don't match on the columns you're using for the join.
You may also try moving UNIT_ITEMS.LOCATION1 = TGS_INFO.LOCATION1 AND UNIT_ITEMS.LOCATION2 = TGS_INFO.LOCATION2
to the WHERE
clause since they're not covered by an index. The query engine is typically smart enough to account for this, but it's something to try.
Upvotes: 0
Reputation: 171178
If you don't include LOCATION1
and LOCATION2
in your indexes the join cannot be satisfied from an index alone. Add these columns to the indexes on both tables.
You probably have to include all other columns that are referenced in your query, too.
Upvotes: 3