mb1231
mb1231

Reputation: 376

SQL Server: Fastest way to transform millions of rows of data from one table to multiple others

I have a staging table with more than 6 million rows of flattened data from a CSV file that I bulk inserted.

I need to take each rows, convert various column values from varchar to int/decimal/datetime, and input each row into numerous new database tables all with foreign key relationships, for now I'll simplify this to: (Parent, Child, OptionalChild) .

I don't need to read it row by row, as each single row contains the parent/child/optional child flattened data.

I am currently going through the records row by row using a SELECT TOP 1, then delete each row after its processed but this is taking hours obviously.

Would appreciate some faster / more efficient approaches.

DECLARE @Id UNIQUEIDENTIFIER;
DECLARE @Date DATETIME2;
DECLARE @Code VARCHAR(10);
DECLARE @Number INT;
DECLARE @OptionalChildCode VARCHAR(10);
DECLARE @OptionalChildNumber INT;

WHILE EXISTS(SELECT * FROM Records)
BEGIN
   SELECT TOP 1 
      @Id = Id, 
      @Date = Date, 
      @Code = Code,
      @Number = Number,
      @OptionalChildCode = OptionalChildCode,
      @OptionalChildNumber = OptionalChildNumber
   FROM Records   

   DECLARE @OutputTable AS TABLE(Id UNIQUEIDENTIFIER, Name VARCHAR(10))

   INSERT INTO (Parent) (ParentDate)
   OUTPUT INSERTED.Id, 'ParentId' INTO @OutputTable(Id, Name)
   VALUES (CONVERT(DATETIME2,@Date, 20))

   INSERT INTO (Child)(ParentId, ChildCode, ChildNumber)
   VALUES (
      (SELECT ObjectId FROM @OutputTable WHERE Name = 'ParentId'),
      @Code,
      CONVERT(INT, @Number)
   )

   IF (@OptionalChildCode IS NULL)
     BEGIN
        INSERT INTO (Child)(ParentId, ChildCode, ChildNumber)
        VALUES (
            (SELECT ObjectId FROM @OutputTable WHERE Name = 'ParentId'),
            @OptionalChildCode,
            CONVERT(INT, @OptionalChildNumber)
         )
     END 

   DELETE FROM Records WHERE Id = @Id
END 

Records table (all columns from CSV bulk import):

Columns: Id INT, Date VARCHAR(50), Code VARCHAR(50), Number VARCHAR(50), OptionalChildCode VARCHAR(50), OptionalChildNumber VARCHAR(50)

Target tables:

--Parent 
Columns: (Id UNIQUEIDENTIFIER, ParentDate DATETIME2)

--Child
Columns: (Id UNIQUEIDENTIFIER, ParentId UNIQUEIDENTIFIER, ChildCode VARCHAR(10), ChildNumber INT)

Sample data (a row from Records table):

1, "2020-01-01-00-00", "Code123", "55", "OptionalCode456", "66"

Expected results:

--Row from Parent table:
111-222-333, 2020-01-01-00-00

--Row from Child table:
333-333-333, 111-222-333, "Code123", 55

--Row from Child table from optional child:
444-444-444, 111-222-333, "OptionalCode456", 66

Upvotes: 0

Views: 746

Answers (1)

Charlieface
Charlieface

Reputation: 72119

The issue here is mainly that you need to get the inserted identity numbers matched against the original table, at the same time as inserting multiple child rows. You cannot use OUTPUT in an INSERT to output anything other than inserted columns.

We can start by using a hack involving MERGE to output the other columns.

We can then conditionally unpivot those rows in order to get one or two child rows to insert.

DECLARE @OutputTable AS TABLE(
  Id UNIQUEIDENTIFIER,
  Code VARCHAR(10),
  Number INT,
  OptionalChildCode VARCHAR(10),
  OptionalChildNumber INT);

MERGE Parent p
USING Records r
ON 1 = 0   -- never match
WHEN NOT MATCHED THEN
  INSERT (ParentDate)
  VALUES (CONVERT(DATETIME2, r.[Date], 20))
OUTPUT inserted.Id, r.Code, CONVERT(INT, r.Number), OptionalChildCode, CONVERT(INT, r.OptionalChildNumber)
  INTO @OutputTable (Id, Code, Number, OptionalChildCode, OptionalChildNumber)
;

INSERT INTO Child
  (ParentId, ChildCode, ChildNumber)
SELECT t.Id, v.Code, v.Number
FROM @OutputTable t
CROSS APPLY (
    SELECT t.Code, t.Number
    UNION ALL
    SELECT t.OptionalChildCode, t.OptionalChildNumber
    WHERE OptionalChildCode IS NOT NULL
) v;

Upvotes: 1

Related Questions