Reputation: 2151
Now we have the Data in Azure Data Lake Store and now are processing the data present there with Azure Data Analytic Job with U-SQL. Now we have a requirement where we need to push data into different output folders based on the column value present.
Suppose once the we process the data and we have output like below
ID | Name | Company
1 Midhun test
2 Midhun2 test2
So I would like to move the first to record to an output folder named "test"\result.tsv and the second to an output folder to "test2"\result.tsv
Will I be able to do this in U-SQL? I am not finding any good reference documents regarding U-SQL. Can you please share the link if you know one.
Upvotes: 0
Views: 296
Reputation: 6684
The current reference documentation is in beta form at http://aka.ms/usql_reference. I am planning an update sometimes in February that adds more missing sections and improves the navigation. Later revisions will add more sample code.
You would basically like is to support this feature: https://feedback.azure.com/forums/327234-data-lake/suggestions/10550388-support-dynamic-output-file-names-in-adla. If you haven't done so, please add your vote to it.
We are actually looking into providing this capability between "now" and the time we release the GA version of the service.
Until the capability becomes available, you have to basically get the name of the files in one script to generate the second script to create the scripts that output the data into the different files.
E.g. you would only get the Company columns from above, and then generate a script (using T4 or Powershell or whatever other tool you like to use, including U-SQL itself :)), that generically has the following format:
... Your U-SQL processing to get the rowset you want to split, lets call it @data
OUTPUT (SELECT * FROM @data WHERE Company == "<insert value of directory name>")
TO "/output/<insert value of directory name>/result.tsv"
USING Outputters.Tsv();
... Repeat the above statement for every directory name
Upvotes: 2