Reputation: 9490
I'd appreciate it if some SQL experts can take a look at this script and verify if it will work or if it can be improved in anyway (performance, ease, etc.). So, considering the following:
Leads {
LeadId INT,
SourceId TINYINT,
PersonId INT,
PhoneId INT,
EmailId INT,
AddressId INT,
ImporterId SMALLINT,
ImportedDateTime DATETIME2(7)
}
Duplicates {
DuplicateId INT,
SourceId TINYINT,
LeadId INT,
ImporterId SMALLINT,
DuplicatedDateTime DATETIME2(7)
}
How will this script perform:
-- Outside variables provided as part of a stored procedure
DECLARE @SourceId TINYINT;
DECLARE @ImporterId SMALLINT;
PRINT 'Deleting the CSVTemp table if it exists';
IF ((SELECT CASE WHEN OBJECT_ID('CSVTemp') IS NOT NULL THEN 1 ELSE 0 END) = 1)
BEGIN
DROP TABLE [CSVTemp];
END
PRINT 'Creating the CSVTemp table';
CREATE TABLE [CSVTemp](
[FirstName] NVARCHAR(48),
[LastName] NVARCHAR(48),
[Phone] BIGINT,
[Email] VARCHAR(96),
[Street] VARCHAR(64),
[Zip] INT
);
PRINT 'Performing a BULK insert into CSVTemp';
BULK INSERT [CSVTemp] FROM '{File}.csv' WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n');
PRINT 'Adding IDENTITY column to CSVTemp';
ALTER TABLE [CSVTemp] ADD [Id] INT IDENTITY(1,1) NOT NULL;
PRINT 'Adding PK constraint to CSVTemp';
ALTER TABLE [CSVTemp] ADD CONSTRAINT [PK_CSVTemp] PRIMARY KEY CLUSTERED(
[Id] ASC
) WITH(
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY];
PRINT 'Counting CSVTemp rows';
DECLARE @Count INT = (SELECT COUNT(1) FROM [CSVTemp]);
PRINT 'Declaring internal variables';
DECLARE @I INT = 0;
PRINT 'Looping through rows in CSVTemp';
WHILE (@I < (@Count + 1))
BEGIN
BEGIN TRANSACTION
DECLARE @FirstName NVARCHAR(48);
DECLARE @LastName NVARCHAR(48);
DECLARE @Phone BIGINT;
DECLARE @Email VARCHAR(96);
DECLARE @Street VARCHAR(64);
DECLARE @Zip INT;
SELECT @FirstName = [FirstName],
@LastName = [LastName],
@Phone = [Phone],
@Email = [Email],
@Street = [Street],
@Zip = [Zip] FROM [CSVTemp] WHERE ([Id] = @I);
DECLARE @LeadId INT = (
SELECT [Leads].[LeadId]
FROM [People].[Person]
JOIN [Management].[Leads] ON ([Leads].[PersonId] = [Person].[PersonId])
JOIN [Communication].[Phones] ON ([Leads].[PhoneId] = [Phones].[PhoneId])
WHERE (([Person].[FirstName] LIKE @FirstName) OR ([Person].[LastName] LIKE @LastName))
AND ([Phones].[PhoneId] = @Phone)
);
IF (@LeadId IS NOT NULL)
BEGIN
INSERT INTO [Management].[Duplicates]([SourceId], [LeadId], [ImporterId]) VALUES(@SourceId, @LeadId, @ImporterId);
END
ELSE
BEGIN
INSERT INTO [People].[Person]([FirstName], [LastName]) VALUES(@FirstName, @LastName);
DECLARE @PersonId INT = @@IDENTITY;
INSERT INTO [Communication].[Phones]([PhoneTypeId], [Number]) VALUES(6, @Phone);
DECLARE @PhoneId INT = @@IDENTITY;
INSERT INTO [Communication].[Emails]([Address]) VALUES(@Email);
DECLARE @EmailId INT = @@IDENTITY;
INSERT INTO [Location].[Addresses]([PostalCode], [Street]) VALUES(@Zip, @Street);
DECLARE @AddressId INT = @@IDENTITY;
INSERT INTO [Management].[Leads]([SourceId], [PersonId], [PhoneId], [EmailId], [AddressId], [ImporterId]) VALUES(@SourceId, @PersonId, @PhoneId, @EmailId, @AddressId, @ImporterId);
END
COMMIT
SET @I = (@I + 1);
END
PRINT 'Deleting CSVTemp table';
DROP TABLE [CSVTemp];
UPDATE
@Will/@Mitch, I don't know if you guys are still around, but I ended up finalizing the script by converting the WHILE
loop to work with a CURSOR
. I tested the script by looping through 10.5k rows and it took 3-5 seconds, which is fine with me. Sadly, I thought I fully understood what @Will was saying about the sets, but I couldn't come up with a way to improve it, so I'll leave it as is. If anyone is willing to give me an example script about what @Will was talking about, I would appreciate it, if not, then thank you @Will and @Mitch for the help so far.
Anyway, here's the updated WHILE
loop that uses a CURSOR
now.
DECLARE @Id INT = 0;
DECLARE C1 CURSOR READ_ONLY FOR (SELECT [Id] FROM [CSVTemp]);
OPEN C1;
FETCH NEXT FROM C1 INTO @Id;
WHILE (@@FETCH_STATUS = 0)
BEGIN
BEGIN TRANSACTION
DECLARE @FirstName NVARCHAR(48);
DECLARE @LastName NVARCHAR(48);
DECLARE @Phone BIGINT;
DECLARE @Email VARCHAR(96);
DECLARE @Street VARCHAR(64);
DECLARE @Zip INT;
SELECT @FirstName = [FirstName],
@LastName = [LastName],
@Phone = [Phone],
@Email = [Email],
@Street = [Street],
@Zip = [Zip] FROM [CSVTemp] WHERE ([Id] = @Id);
DECLARE @LeadId INT = (
SELECT [Leads].[LeadId]
FROM [People].[Person]
JOIN [Management].[Leads] ON ([Leads].[PersonId] = [Person].[PersonId])
JOIN [Communication].[Phones] ON ([Leads].[PhoneId] = [Phones].[PhoneId])
WHERE (([Person].[FirstName] LIKE @FirstName) AND ([Person].[LastName] LIKE @LastName))
AND ([Phones].[Number] = @Phone)
);
IF (@LeadId IS NOT NULL)
BEGIN
INSERT INTO [Management].[Duplicates]([SourceId], [LeadId], [ImporterId]) VALUES (@SourceId, @LeadId, @ImporterId);
END
ELSE
BEGIN
INSERT INTO [People].[Person]([FirstName], [LastName]) VALUES(@FirstName, @LastName);
DECLARE @PersonId INT = SCOPE_IDENTITY();
INSERT INTO [Communication].[Phones]([PhoneTypeId], [Number]) VALUES(6, @Phone);
DECLARE @PhoneId INT = SCOPE_IDENTITY();
INSERT INTO [Communication].[Emails]([Address]) VALUES(@Email);
DECLARE @EmailId INT = SCOPE_IDENTITY();
INSERT INTO [Location].[Addresses]([PostalCode], [Street]) VALUES(@Zip, @Street);
DECLARE @AddressId INT = SCOPE_IDENTITY();
INSERT INTO [Management].[Leads]([SourceId], [PersonId], [PhoneId], [EmailId], [AddressId], [ImporterId]) VALUES(@SourceId, @PersonId, @PhoneId, @EmailId, @AddressId, @ImporterId);
END
COMMIT
FETCH NEXT FROM C1 INTO @Id;
END
CLOSE C1;
DEALLOCATE C1;
Upvotes: 1
Views: 188
Reputation: 300669
Don't use @@IDENTITY
. Use SCOPE_IDENTITY()
:
SCOPE_IDENTITY and @@IDENTITY return the last identity values that are generated in any table in the current session. However, SCOPE_IDENTITY returns values inserted only within the current scope; @@IDENTITY is not limited to a specific scope.
For example, there are two tables, T1 and T2, and an INSERT trigger is defined on T1. When a row is inserted to T1, the trigger fires and inserts a row in T2. This scenario illustrates two scopes: the insert on T1, and the insert on T2 by the trigger.
Assuming that both T1 and T2 have identity columns, @@IDENTITY and SCOPE_IDENTITY will return different values at the end of an INSERT statement on T1. @@IDENTITY will return the last identity column value inserted across any scope in the current session. This is the value inserted in T2. SCOPE_IDENTITY() will return the IDENTITY value inserted in T1. This was the last insert that occurred in the same scope. The SCOPE_IDENTITY() function will return the null value if the function is invoked before any INSERT statements into an identity column occur in the scope.
Upvotes: 2