Surbhi Tayal
Surbhi Tayal

Reputation: 11

is there a way to write u-sql queries without using EXTRACT

I have a metadata activity output which is a json of blobs in my container. I want to input these names into my foreach activity where some u-sql query is performed on the blob as per the file name. Is it possible?

Upvotes: 0

Views: 59

Answers (1)

PerfectlyPanda
PerfectlyPanda

Reputation: 3456

You need to include either a SELECT or an EXTRACT. Since you are pulling from files, you are going to want to use EXTRACT.

If I understand your question correctly, you want to run different U-SQL scripts based on the file name.

There are a couple ways to do this:

1) use If conditions in Data Factory to call different U-SQL scripts based on the file name. Nesting the if statements will allow you to have more than two options. There are several string manipulation functions to help you with this. Say one path is @item.Contains('a').

{
    "name": "<Name of the activity>",
    "type": "IfCondition",
    "typeProperties": {
            "expression":  {
                 "value":  "@item() == <file name>", 
                 "type": "Expression"
             }
            "ifTrueActivities": [
            {
                "<U-SQL script = 1>"
            }
        ],

        "ifFalseActivities": [
            {
                "<U-SQL script 2>"
            }
            ]
    }
}

2) The second option is to use a single U-SQL script and do the split from there. Again, string manipulation functions can help via pattern matching. There is some advantage to this as far as organization goes as you can store the unique scripts in stored procedures and the U-SQL script would simply check the file name passed in and call the relevant stored proc.

//This would be added by data factory
DECLARE @fileName = "/Samples/Data/SearchLog.tsv";

IF @fileName == "/Samples/Data/SearchLog.tsv"
THEN
    @searchlog =
        EXTRACT UserId int,
                Start DateTime,
                Region string,
                Query string,
                Duration int?,
                Urls string,
                ClickedUrls string
        FROM "/Samples/Data/SearchLog.tsv"
        USING Extractors.Tsv();

    OUTPUT @searchlog
    TO @fileName
    USING Outputters.Csv();

ELSE
    @searchlog =
        EXTRACT UserId int,
                Start DateTime,
                Region string,
                Query string,
                Duration int?,
                Urls string,
                ClickedUrls string
        FROM @fileName
        USING Extractors.Tsv();

    OUTPUT @searchlog
    TO "/output/SearchLogResult1.csv"
    USING Outputters.Csv();
END;

Something to think about is that Data Lake Analytics is going to be more efficient if you can combine multiple files into one statement. You can have multiple EXTRACT and OUTPUT statements. I would encourage you to explore whether or not you could use pattern matching in your EXTRACT statements to split the U-SQL processing without needing the foreach loop in data factory.

Upvotes: 1

Related Questions