Reputation: 11
I have a metadata activity output which is a json of blobs in my container. I want to input these names into my foreach activity where some u-sql query is performed on the blob as per the file name. Is it possible?
Upvotes: 0
Views: 59
Reputation: 3456
You need to include either a SELECT
or an EXTRACT
. Since you are pulling from files, you are going to want to use EXTRACT
.
If I understand your question correctly, you want to run different U-SQL scripts based on the file name.
There are a couple ways to do this:
1) use If conditions in Data Factory to call different U-SQL scripts based on the file name. Nesting the if statements will allow you to have more than two options. There are several string manipulation functions to help you with this. Say one path is @item.Contains('a')
.
{
"name": "<Name of the activity>",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "@item() == <file name>",
"type": "Expression"
}
"ifTrueActivities": [
{
"<U-SQL script = 1>"
}
],
"ifFalseActivities": [
{
"<U-SQL script 2>"
}
]
}
}
2) The second option is to use a single U-SQL script and do the split from there. Again, string manipulation functions can help via pattern matching. There is some advantage to this as far as organization goes as you can store the unique scripts in stored procedures and the U-SQL script would simply check the file name passed in and call the relevant stored proc.
//This would be added by data factory
DECLARE @fileName = "/Samples/Data/SearchLog.tsv";
IF @fileName == "/Samples/Data/SearchLog.tsv"
THEN
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM "/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
OUTPUT @searchlog
TO @fileName
USING Outputters.Csv();
ELSE
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM @fileName
USING Extractors.Tsv();
OUTPUT @searchlog
TO "/output/SearchLogResult1.csv"
USING Outputters.Csv();
END;
Something to think about is that Data Lake Analytics is going to be more efficient if you can combine multiple files into one statement. You can have multiple EXTRACT
and OUTPUT
statements. I would encourage you to explore whether or not you could use pattern matching in your EXTRACT
statements to split the U-SQL processing without needing the foreach loop in data factory.
Upvotes: 1