PowerBI-Morgoth
PowerBI-Morgoth

Reputation: 11

Which is the best way to use a Upsert method on Azure Data Factory?

I have some csv files stored in a Blob Storage. Each csv gets updated every day. That update consist in the insertion of some new rows and the modification of some old rows. I'm using Azure Data Factory (v2) to get that data from the Blob storage and sink it on a SQL database.

The problem is that my process takes around 15 minutes to finish, so I suspect that I'm not following the BEST PRACTICES.

I don't know how exactly works the "Upsert" sink method. But I think this method needs a boolean condition that indicates if you want to Update that row (if true) or insert that row (if false).

I get that condition using a column that I get by making a join of the csv (origin) with the ddbb (destiny). Making it this way you will get a "null" if the row is a new one, and a "not null" if the row exists on the ddbb already. So I insert the rows with that "null" value and the other ones I just update them.

This is the best/correct way to do this kind of upsert methods? Could I do something better to improve my times?

Upvotes: 1

Views: 11196

Answers (1)

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

Are you using Data Flows? If so, you can update your SQL DB using upsert or separate insert/update paths. Set the policy for which values you wish to update in an Alter Row transformation, then set the Sink for Upsert, Update, and/or Insert. You will need to identify the key column on your sink that we will use as the update key on your database.

Upvotes: 1

Related Questions