Reputation: 17408
I have a table like this:
CREATE TABLE [dbo].[Table](
[Id] [INT] IDENTITY(1,1) NOT NULL,
[A] [NVARCHAR](150) NULL,
[B] [NVARCHAR](150) NULL,
[C] [NVARCHAR](150) NULL,
[D] [NVARCHAR](150) NULL,
[E] [NVARCHAR](150) NULL,
CONSTRAINT [con] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
and look for performance inprovements to join this table.
Option 1 - Combine all string into nvarchar primary key and then do:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Table.PKString
To my knowledge this is bad practice.
Option 2 - Use:
Source.[A] + Source.[B] + Source.[C] + Source.[D] + Source.[E] = Target.[A] + Target.[B] + Target.[C] + Target.[D] + Target.[E]
Option 3 - Use:
Source.[A] = Target.[A] And
...
Source.[E] = Target.[E]
Upvotes: 3
Views: 1073
Reputation: 453327
Your option 1 won't work correctly as it will treat ('ab','c')
as equal to ('a','bc')
.
Also your columns are nullable and concatenating null yields null.
You can't combine all columns into an nvarchar
primary key due to nullability and even without that you would still be at risk of failure as the max length would be 1,500 bytes which is well over the max index key column size.
For similar reasons of length a composite index using all columns also wouldn't work.
You could create a computed column that uses all those 5 column values as input to calculate a checksum or hash value and index that however.
ALTER TABLE [dbo].[Table]
ADD HashValue AS CAST(hashbytes('SHA1', ISNULL([A], '') + ISNULL([B], '')+ ISNULL([C], '')+ ISNULL([D], '')+ ISNULL([E], '')) AS VARBINARY(20));
CREATE INDEX ix
ON [dbo].[Table](HashValue)
INCLUDE ([A], [B], [C], [D], [E])
Then use that in the join with a residual predicate on the other 5 columns in case of hash collisions.
If you want NULL
to compare equal you could use
SELECT *
FROM [dbo].[Table1] source
JOIN [dbo].[Table2] target
ON source.HashValue = target.HashValue
AND EXISTS(SELECT source.A,
source.B,
source.C,
source.D,
source.E
INTERSECT
SELECT target.A,
target.B,
target.C,
target.D,
target.E)
Note the index created above basically reproduces the whole table so you might want to consider creating as clustered instead if your queries need it to be covering.
Upvotes: 5