Roh1
Roh1

Reputation: 53

How to copy huge data from oracle db to Aure blob storage using copy activity

I am pretty new to azure. I want to copy data from on-premise oracle db to azure blob storage using the copy activity.

I am already using a single copy activity that has a SQL query, but the query return rows in millions. And it takes more than 5 hrs to copy the data

I read some docs suggesting doing the incremental copy. But what steps I need to follow to achieve this ?

Upvotes: 0

Views: 983

Answers (1)

Aswin
Aswin

Reputation: 7156

  • Select a watermark column in the source oracle table. Watermark column should be a column keeps increasing whenever new rows are inserted, or rows get updated.
  • Create a watermark table and store a minimum value of watermark column. By doing this, all data will be moved to target in the first run.
  • Write a stored procedure in database to update the watermark value of watermark table. Stored Procedure should have an input parameter and watermark value should get updated with this input parameter, whenever it is run.
CREATE  PROCEDURE usp_update_watermark 
@LastModifiedtime datetime 
AS 
 BEGIN  
 UPDATE watermarktable 
 SET [WatermarkValue] = @LastModifiedtime
  WHERE [TableName] = @TableName 
END
  • Take two lookup activities in ADF.
  • Lookup activity1 is to read the value from watermark table.
  • Lookup activity2 is to read the maximum watermark value from the source table.
  • Copy data activity is added in sequence to Lookup activities 1 and 2. So that it will run after Both Lookup1 and 2.
  • Write a query in copy data source settings to copy data between lookup activity1 watermark value and lookup activity2 watermark value.
select * from data_source_table where 
LastModifytime > '@{activity('LookupLastWaterMarkActivity').output.firstRow.WatermarkValue}' 
and LastModifytime <= '@{activity('LookupCurrentWaterMarkActivity').output.firstRow.NewWatermarkvalue}'
  • Then take a execute stored procedure activity and include the stored procedure which updates watermark table.

  • Pass the lookup activity2 output as a parameter to Stored procedure. enter image description here

  • Publish and run the pipeline.

  • In first run all data will be copied and watermark table gets updated with new value.

  • From next run, data will be copied from new watermark value only.

Reference: Microsoft document Incrementally copy a table using PowerShell - Azure Data Factory | Microsoft Learn

Upvotes: 0

Related Questions