Reputation: 376
I have a staging table with more than 6 million rows of flattened data from a CSV file that I bulk inserted.
I need to take each rows, convert various column values from varchar
to int
/decimal
/datetime
, and input each row into numerous new database tables all with foreign key relationships, for now I'll simplify this to: (Parent
, Child
, OptionalChild
) .
I don't need to read it row by row, as each single row contains the parent/child/optional child flattened data.
I am currently going through the records row by row using a SELECT TOP 1, then delete each row after its processed but this is taking hours obviously.
Would appreciate some faster / more efficient approaches.
DECLARE @Id UNIQUEIDENTIFIER;
DECLARE @Date DATETIME2;
DECLARE @Code VARCHAR(10);
DECLARE @Number INT;
DECLARE @OptionalChildCode VARCHAR(10);
DECLARE @OptionalChildNumber INT;
WHILE EXISTS(SELECT * FROM Records)
BEGIN
SELECT TOP 1
@Id = Id,
@Date = Date,
@Code = Code,
@Number = Number,
@OptionalChildCode = OptionalChildCode,
@OptionalChildNumber = OptionalChildNumber
FROM Records
DECLARE @OutputTable AS TABLE(Id UNIQUEIDENTIFIER, Name VARCHAR(10))
INSERT INTO (Parent) (ParentDate)
OUTPUT INSERTED.Id, 'ParentId' INTO @OutputTable(Id, Name)
VALUES (CONVERT(DATETIME2,@Date, 20))
INSERT INTO (Child)(ParentId, ChildCode, ChildNumber)
VALUES (
(SELECT ObjectId FROM @OutputTable WHERE Name = 'ParentId'),
@Code,
CONVERT(INT, @Number)
)
IF (@OptionalChildCode IS NULL)
BEGIN
INSERT INTO (Child)(ParentId, ChildCode, ChildNumber)
VALUES (
(SELECT ObjectId FROM @OutputTable WHERE Name = 'ParentId'),
@OptionalChildCode,
CONVERT(INT, @OptionalChildNumber)
)
END
DELETE FROM Records WHERE Id = @Id
END
Records table (all columns from CSV bulk import):
Columns: Id INT, Date VARCHAR(50), Code VARCHAR(50), Number VARCHAR(50), OptionalChildCode VARCHAR(50), OptionalChildNumber VARCHAR(50)
Target tables:
--Parent
Columns: (Id UNIQUEIDENTIFIER, ParentDate DATETIME2)
--Child
Columns: (Id UNIQUEIDENTIFIER, ParentId UNIQUEIDENTIFIER, ChildCode VARCHAR(10), ChildNumber INT)
Sample data (a row from Records table):
1, "2020-01-01-00-00", "Code123", "55", "OptionalCode456", "66"
Expected results:
--Row from Parent table:
111-222-333, 2020-01-01-00-00
--Row from Child table:
333-333-333, 111-222-333, "Code123", 55
--Row from Child table from optional child:
444-444-444, 111-222-333, "OptionalCode456", 66
Upvotes: 0
Views: 746
Reputation: 72119
The issue here is mainly that you need to get the inserted identity numbers matched against the original table, at the same time as inserting multiple child rows. You cannot use OUTPUT
in an INSERT
to output anything other than inserted
columns.
We can start by using a hack involving MERGE
to output the other columns.
We can then conditionally unpivot those rows in order to get one or two child rows to insert.
DECLARE @OutputTable AS TABLE(
Id UNIQUEIDENTIFIER,
Code VARCHAR(10),
Number INT,
OptionalChildCode VARCHAR(10),
OptionalChildNumber INT);
MERGE Parent p
USING Records r
ON 1 = 0 -- never match
WHEN NOT MATCHED THEN
INSERT (ParentDate)
VALUES (CONVERT(DATETIME2, r.[Date], 20))
OUTPUT inserted.Id, r.Code, CONVERT(INT, r.Number), OptionalChildCode, CONVERT(INT, r.OptionalChildNumber)
INTO @OutputTable (Id, Code, Number, OptionalChildCode, OptionalChildNumber)
;
INSERT INTO Child
(ParentId, ChildCode, ChildNumber)
SELECT t.Id, v.Code, v.Number
FROM @OutputTable t
CROSS APPLY (
SELECT t.Code, t.Number
UNION ALL
SELECT t.OptionalChildCode, t.OptionalChildNumber
WHERE OptionalChildCode IS NOT NULL
) v;
Upvotes: 1