Midhun Murali
Midhun Murali

Reputation: 2151

Moving Data to Different Output folders based on column value

Now we have the Data in Azure Data Lake Store and now are processing the data present there with Azure Data Analytic Job with U-SQL. Now we have a requirement where we need to push data into different output folders based on the column value present.

Suppose once the we process the data and we have output like below

ID | Name | Company

1 Midhun test

2 Midhun2 test2

So I would like to move the first to record to an output folder named "test"\result.tsv and the second to an output folder to "test2"\result.tsv

Will I be able to do this in U-SQL? I am not finding any good reference documents regarding U-SQL. Can you please share the link if you know one.

Upvotes: 0

Views: 296

Answers (1)

Michael Rys
Michael Rys

Reputation: 6684

The current reference documentation is in beta form at http://aka.ms/usql_reference. I am planning an update sometimes in February that adds more missing sections and improves the navigation. Later revisions will add more sample code.

You would basically like is to support this feature: https://feedback.azure.com/forums/327234-data-lake/suggestions/10550388-support-dynamic-output-file-names-in-adla. If you haven't done so, please add your vote to it.

We are actually looking into providing this capability between "now" and the time we release the GA version of the service.

Until the capability becomes available, you have to basically get the name of the files in one script to generate the second script to create the scripts that output the data into the different files.

E.g. you would only get the Company columns from above, and then generate a script (using T4 or Powershell or whatever other tool you like to use, including U-SQL itself :)), that generically has the following format:

... Your U-SQL processing to get the rowset you want to split, lets call it @data

OUTPUT (SELECT * FROM @data WHERE Company == "<insert value of directory name>") 
TO "/output/<insert value of directory name>/result.tsv"
USING Outputters.Tsv();

... Repeat the above statement for every directory name

Upvotes: 2

Related Questions