Reputation: 1482
Why does an Azure SQL Data Warehouse Polybase Query to Azure Data Lake Gen 2 return many rows for a single file source, but zero rows for the parent folder source?
I created:
Everything works fine when my external table points to a specific file, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/time001.txt',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Many rows returned, therefore I am confident all items mentioned above are configured correctly.
The Time folder in Azure Data Lake Gen 2 contains many files, not just time001.txt. When I change my external table to point at a folder, and not an individual file, the query returns zero rows, i.e.
CREATE EXTERNAL TABLE [ext].[Time]
(
[TimeID] int NOT NULL,
[HourNumber] tinyint NOT NULL,
[MinuteNumber] tinyint NOT NULL,
[SecondNumber] tinyint NOT NULL,
[TimeInSecond] int NOT NULL,
[HourlyBucket] varchar(15) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
)
WITH
(
LOCATION = '/Time/',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = uncompressedcsv,
REJECT_TYPE = value,
REJECT_VALUE = 2147483647
);
SELECT * FROM [ext].[Time];
Zero rows returned
I tried:
But always zero rows. I also followed the instructions at https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store
I tested all the files within the folder and individually each returns many rows of data.
I queried all the files from Blob storage and not ADLS Gen2 and the "Folder" query returns all rows as expected.
How do I query all files in a folder "as one" from Azure Data Lake Gen2 storage using Azure SQL Data Warehouse and Polybase?
Upvotes: 2
Views: 451
Reputation: 11
I was facing the exactly same issue: the problem was on the Data Source protocol.
Script with error:
CREATE EXTERNAL DATA SOURCE datasourcename
WITH (
TYPE = HADOOP,
LOCATION = 'abfss://[email protected]',
CREDENTIAL = credential_name
Script that solves issue:
CREATE EXTERNAL DATA SOURCE datasourcename
WITH (
TYPE = HADOOP,
LOCATION = 'abfss://[email protected]',
CREDENTIAL = credential_name
The only change needed was the LOCATION.
Thanks to the Microsoft Support Team for helping me on this.
Upvotes: 1