arushi joshi
arushi joshi

Reputation: 1

How to create an incremental load with Salesforce as source in Azure data factory?

Is there any way we can fetch the max of last modified date from the last processed file and store it in a config table

Upvotes: 0

Views: 2884

Answers (1)

KarthikBhyresh-MT
KarthikBhyresh-MT

Reputation: 5034

From Supported data stores and formats you can see Salesforce, Salesforce service cloud and Marketing cloud are supported.

You have to perform the following steps:

  • Prepare the data store to store the watermark value.
  • Create a data factory.
  • Create linked services.
  • Create source, sink, and watermark datasets.
  • Create a pipeline.
  • Run the pipeline.

Follow this to setup Linked Service with Salesforce in Azure Data Factory

enter image description here

When copying data from Salesforce, you can use either SOQL query or SQL query. Note that these two has different syntax and functionality support, do not mix it. You are suggested to use the SOQL query, which is natively supported by Salesforce.

Process for incremental loading, or delta loading, of data through a watermark:

In this case, you define a watermark in your source database. A watermark is a column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark. The workflow for this approach is depicted in the following diagram:

enter image description here

ADF will scan all the files from the source store, apply the file filter by their LastModifiedDate, and only copy the new and updated file since last time to the destination store.

For capabilities, Prerequisites and Salesforce request limits refer Copy data from and to Salesforce by using Azure Data Factory

Refer doc: Delta copy from a database with a control table This article describes a template that's available to incrementally load new or updated rows from a database table to Azure by using an external control table that stores a high-watermark value.

This template requires that the schema of the source database contains a timestamp column or incrementing key to identify new or updated rows.

The template contains four activities:

  • Lookup retrieves the old high-watermark value, which is stored in an external control table.
  • Another Lookup activity retrieves the current high-watermark value from the source database.
  • Copy copies only changes from the source database to the destination store. The query that identifies the changes in the source database is similar to 'SELECT * FROM Data_Source_Table WHERE TIMESTAMP_Column > “last high-watermark” and TIMESTAMP_Column <= “current high-watermark”'.
  • StoredProcedure writes the current high-watermark value to an external control table for delta copy next time.

Go to the Delta copy from Database template. Create a New connection to the source database that you want to data copy from.enter image description here

Create connections to the external control table and stored procedure that you created and Select Use this template.

Choose the available pipeline

For Stored procedure name, choose [dbo].[update_watermark]. Select Import parameter, and then select Add dynamic content.

enter image description here

click Add dynamic content and Type in below query. This will get a maximum date in your watermark column that we can use for delta slice.

You can use this query to fetch the max of last modified date from the last processed file

select MAX(LastModifytime) as NewWatermarkvalue from data_source_table"

or

For files only you can use Incrementally copy new and changed files based on LastModifiedDate by using the Copy Data tool

Refer: Source: ADF Incremental loading with configuration stored in a table

Upvotes: 2

Related Questions