Tayyab Anwar
Tayyab Anwar

Reputation: 319

Configure dynamic number of parameters in Azure Data Factory Pipeline based on intervals?

Our usecase requires a configurable input to be passed to a U-SQL script through Azure Data Factory. We are evaluating feasibility, and need to automate the number of variables passed to the U-SQL script depending upon the interval of the time slice for the activity.

For example, if the U-SQL script is to be run every 4 hours, it will need to read data from 4 different files, one for each hour. If the interval is changed to 5 hours, we need have a need to automate this so that it will automatically read from 5 different files, without having to update the U-SQL script.

The file path will be in the format: Source\YYYY-MM-DD\hh.csv

Is this dynamicity achievable with the current facilities and set of functions that data factory or data lake provides? Or is there a workaround for this so deployment of the data factory pipeline can be automated?

Upvotes: 2

Views: 1244

Answers (1)

Michael Rys
Michael Rys

Reputation: 6684

If the files have the same schema, you can use the file set capability in U-SQL as follows:

DECLARE @begin_date DateTime = DateTime.Parse("....");
DECLARE @end_date DateTime = DateTime.Parse("....");

@data = 
  EXTRACT filedate DateTime, // insert rest of your schema
  FROM "/Source/{filedate:yyyy}-{filedate:MM}-{filedate:dd}/{filedate:hh}.csv"
  USING Extractors.Csv();

@data = 
  SELECT * 
  FROM @data
  WHERE filedate BETWEEN @begin_date AND @end_date;

// ... rest of script ...

The above code will only access the files that satisfy the predicate on the filedate and not read the other files.

Then you use ADF's parameterization capability to parameterize @begin_date and @end_date to give you the interval you want.

Upvotes: 2

Related Questions