Reputation: 319
Our usecase requires a configurable input to be passed to a U-SQL script through Azure Data Factory. We are evaluating feasibility, and need to automate the number of variables passed to the U-SQL script depending upon the interval of the time slice for the activity.
For example, if the U-SQL script is to be run every 4 hours, it will need to read data from 4 different files, one for each hour. If the interval is changed to 5 hours, we need have a need to automate this so that it will automatically read from 5 different files, without having to update the U-SQL script.
The file path will be in the format: Source\YYYY-MM-DD\hh.csv
Is this dynamicity achievable with the current facilities and set of functions that data factory or data lake provides? Or is there a workaround for this so deployment of the data factory pipeline can be automated?
Upvotes: 2
Views: 1244
Reputation: 6684
If the files have the same schema, you can use the file set capability in U-SQL as follows:
DECLARE @begin_date DateTime = DateTime.Parse("....");
DECLARE @end_date DateTime = DateTime.Parse("....");
@data =
EXTRACT filedate DateTime, // insert rest of your schema
FROM "/Source/{filedate:yyyy}-{filedate:MM}-{filedate:dd}/{filedate:hh}.csv"
USING Extractors.Csv();
@data =
SELECT *
FROM @data
WHERE filedate BETWEEN @begin_date AND @end_date;
// ... rest of script ...
The above code will only access the files that satisfy the predicate on the filedate
and not read the other files.
Then you use ADF's parameterization capability to parameterize @begin_date
and @end_date
to give you the interval you want.
Upvotes: 2