Jimmy
Jimmy

Reputation: 25

Pipeline keeps failing in Azure Synapse (running a simple query)

I am learning how to create pipelines with the help of a Udemy course in Azure Synapse. I created a link service that connects to my 'master' database of my serverless SQL pool. The connection itself was successful:

enter image description here

I then created a simple 'script activity' in the Pipeline which connects to this linked service and then returns a query.

enter image description here

Here is the full query I used (I masked some of the data):


SELECT
    TOP 100 *, result.filename()
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/trip_data_green_parquet/year=*/month=*/*.parquet',
        FORMAT = 'PARQUET'
    )  
    WITH (
        VendorID INT, 
        lpep_pickup_datetime datetime2(7)
        )   AS [result]

It should be noted that the above query works fine when I run it outside of the pipeline. I then clicked 'validate' to check for errors and no errors were found:

enter image description here

But when I clicked 'Debug' it failed:

enter image description here

Here is the full error code:

{
    "errorCode": "2011",
    "message": "Argument {0} is null or empty.\r\nParameter name: paraKey",
    "failureType": "UserError",
    "target": "Script1",
    "details": []
}

Any help would be greatly appreciated.

I tried running the following SQL query and it worked:

SELECT
    TOP 10 *
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/taxi_zone_without_header.csv',
        FORMAT = 'CSV',
        PARSER_VERSION = '2.0'
    ) 
   WITH(
        LocationID SMALLINT 1,
        Borough VARCHAR(15) COLLATE Latin1_General_100_CI_AI_SC_UTF8 2,
        service_zone VARCHAR(15) COLLATE Latin1_General_100_CI_AI_SC_UTF8 4,
        Zone VARCHAR(50) COLLATE Latin1_General_100_CI_AI_SC_UTF8 3
    ) AS [result]

I'm not too sure why the other one did not work.

Upvotes: 1

Views: 1027

Answers (1)

Aswin
Aswin

Reputation: 7126

"Argument {0} is null or empty.\r\nParameter name: paraKey"

This error is because there is no column name for results.filename() in the SQL script. Try giving result.filename() as file_name in the script.

I repro'd this without alias name for filename() function and got the same error.

enter image description here

  • Then column name is added in script and synapse pipeline is run successfully.

enter image description here

SQL script:

SELECT
    TOP 100 *, result.filename() as file_name
FROM
    OPENROWSET(
        BULK 'https://xxxx.dfs.core.windows.net/taxi/raw/trip_data_green_parquet/year=*/month=*/*.parquet',
        FORMAT = 'PARQUET'
    )  
    WITH (
        VendorID INT, 
        lpep_pickup_datetime datetime2(7)
        )   AS [result]

Upvotes: 1

Related Questions