Ashim Sinha
Ashim Sinha

Reputation: 97

In azure datafactory how to copy data from blob to sql without duplication?

In azure datafactory how to copy data from blob to sql without duplication ie if the pipeline runs at the slice of every 15 min then how to avoid getting duplicate data

Upvotes: 2

Views: 4770

Answers (3)

Todd Klein
Todd Klein

Reputation: 1

I had the same problem and found this link to be helpful: https://www.mssqltips.com/sqlservertip/6365/incremental-file-load-using-azure-data-factory/

In our case, we only add files to blob storage and never modify them after that so the job is to simply pick up new files created within the latest 15 minutes and add them to the SQL container. The Incremental Copy procedure described in the link seems to work great so far.

I can imagine that in some cases you may need to add a stored procedure to act on the SQL container after this, but we did not need it.

Upvotes: 0

Eduardo Pacheco
Eduardo Pacheco

Reputation: 1

I had the same issue and I found that you can add the slice start time and slice end time to your stored procedure and filter the queries using them as any other parameter, that will help you to load the data by slices and not the same data the number of slices you have, hope it's clear enough.

"typeProperties": {
                     "storedProcedureName": "sp_sample",
                     "storedProcedureParameters": {
                         "DateTime": "$$Text.Format('{0:yyyy-MM-dd HH:mm:ss}', SliceStart)"
                     }
                 }

https://learn.microsoft.com/en-us/azure/data-factory/data-factory-stored-proc-activity

Upvotes: 0

Jason H
Jason H

Reputation: 131

The solution isn't automatic, but you can use a Copy Activity, and use a stored procedure in the SQL sink to handle rows that may already exist. Perhaps TSQL Merge statement, or an Insert / Update statement inside.

https://azure.microsoft.com/en-us/documentation/articles/data-factory-copy-activity/

Invoke stored procedure for SQL Sink. When copying data into SQL Server or Azure SQL Database, a user specified stored procedure could be configured and invoked.

Thanks, Jason

Upvotes: 1

Related Questions