Nario
Nario

Reputation: 551

process multiple files on azure data lake

let's assume there are two file sets A and B on azure data lake store.

/A/Year/
/A/Month/Day/Month/
/A/Year/Month/Day/A_Year_Month_Day_Hour

/B/Year/
/B/Month/Day/Month/
/B/Year/Month/Day/B_Year_Month_Day_Hour

I want to get some values (let's say DateCreated of A entity) and use these values generate file paths for B set.

how can I achieve that?

some thoughts,but i'm not sure about this. 1.select values from A 2.store on some storage ( azure data lake or azure sql database). 3. build one comma separated string pStr 4. pass pStr via Data Factory to stored procedure which generates file paths with pattern.

EDIT

according to @mabasile_MSFT answer

Here is what i have right now. First USQL script that generates json file, which looks following way.

{
FileSet:["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14",
"/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]
}

ADF pipeline which contains Lookup and second USQL script. Lookup reads this json file FileSet property and as i understood i need to somehow pass this json array to second script right? But usql compiler generates string variable like

DECLARE @fileSet string = "["/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__12", "/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__13", "/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__14", "/Data/SomeEntity/2018/3/5/SomeEntity_2018_3_5__15"]"

and the script even didn't get compile after it.

Upvotes: 0

Views: 1352

Answers (2)

mabasile_MSFT
mabasile_MSFT

Reputation: 511

You will need two U-SQL jobs, but you can instead use an ADF Lookup activity to read the filesets.

Your first ADLA job should extract data from A, build the filesets, and output to a JSON file in Azure Storage.

Then use a Lookup activity in ADF to read the fileset names from your JSON file in Azure Storage.

Then define your second U-SQL activity in ADF. Set the fileset as a parameter (under Script > Advanced if you're using the online UI) in the U-SQL activity - the value will look something like @{activity('MyLookupActivity').output.firstRow.FileSet} (see Lookup activity docs above).

ADF will write in the U-SQL parameter as a DECLARE statement at the top of your U-SQL script. If you want to have a default value encoded into your script as well, use DECLARE EXTERNAL - this will get overwritten by the DECLARE statements ADF writes in so it won't cause errors.

I hope this helps, and let me know if you have additional questions!

Upvotes: 2

Miguel Domingues
Miguel Domingues

Reputation: 440

Try this root link, that can help you start with all about u-sql: http://usql.io

Usefull link for your question: https://saveenr.gitbooks.io/usql-tutorial/content/filesets/filesets-with-dates.html

Upvotes: -1

Related Questions