Blue Owl
Blue Owl

Reputation: 131

Azure SQL trigger processing on 5 million rows

I am sending telemetry data to a SQL table made of different columns, and one of this columns receive a varchar that is actually a json array, such as :

'[{"data1": 5,"data2": 12, "data3": 2},{"data1": 7,"data2": 8, "data3": 1},{"data1": 4,"data2": 2, "data3": 11}]'

The length of this array can change, but each element of the array is made of all three keys/values.

I want to integrate this data into a dedicated table, whose columns are : (LandingTableID, Data1, Data2, Data3)

I have developped a trigger on each insert of my landing table to perform so, and it works fine. However, I am looking for the best solution to integrate the existing 5 millions rows that are already in my landing table. What would be the best way to proceed here?

As I am using Azure SQL, I cannot use SQL agents (but I can use Azure functions instead).

Upvotes: 1

Views: 342

Answers (3)

wBob
wBob

Reputation: 14399

Just picking up on Conor's suggestion, you could adapt your trigger code that shreds the JSON to shred the 5 million rows into a temp table. Use SELECT INTO to create the temp table (#) on the fly and use the IDENTITY function to add a row identifier to the temp table. This will allow you to batch up the final INSERT to the main table if you think it's necessary.

Then INSERT the records from the temp table which are now in record format, not JSON, directly into your main table. This should only take a few seconds, depending on what other processes are running, what tier your database is set at etc. This way you are splitting out the shredding of the JSON into two distinct operations reducing the load on each one, and making use of tempdb for temporary storage. Something like this:

-- Take the historical JSON and shred it into a temp table as records
SELECT
    IDENTITY( INT, 1, 1 ) AS rowId,
    t.rowId originalRowId,
    j.[key],
    JSON_VALUE ( j.[value], '$.data1' ) AS [data1],
    JSON_VALUE ( j.[value], '$.data2' ) AS [data2],
    JSON_VALUE ( j.[value], '$.data3' ) AS [data3]
INTO #tmp
FROM yourLandingTable t
    CROSS APPLY OPENJSON ( yourJSON ) j


-- Now insert the records into the main table in batches
INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId Between 1 And 1000000


INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId Between 1 And 2000000

etc ...

INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId > 5000000

Upvotes: 1

Conor Cunningham MSFT
Conor Cunningham MSFT

Reputation: 4509

You can use OPENJSON to crack open the JSON array and turn it into rows. There is an example in this doc page that shows the pattern you can use:

https://learn.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-ver15

Upvotes: 1

Martin Cairney
Martin Cairney

Reputation: 1767

I probably would have designed this as having Data Lake as the landing zone for the data followed by some processing using an Azure Function for example to turn the JSON into a tabular format and then do the final loads into your SQL tables.

You could orchestrate with Data Factory

Those data volumes on a trigger may suffer performance issues.

Upvotes: 1

Related Questions