Reputation: 397
I have a bucket (AWS) in a folder with 3 PARQUET files that are the same and have different names:
I'm trying to create an EXTERNAL TABLE with the code below:
CREATE EXTERNAL TABLE tb_Test
(
coluna_1 INT,
coluna_2 VARCHAR(100)
)
WITH
(
LOCATION = '/testeParquet/**',
DATA_SOURCE = s3_ds,
FILE_FORMAT = ParquetFileFormat
);
GO
when I try to read the External table I get this error message:
Msg 16561, Level 16, State 1, Line 109
External table 'db_S3.dbo.tb_Test' is not accessible because content of directory cannot be listed.
If I inform the name of the file it creates and reads correctly.
But I would like to create with all the files in the folder without having to inform file by file.
Upvotes: 1
Views: 784
Reputation: 21
I had similar problem and it seems that in SQL Server 2022 listing of S3 folder is not supported yet, but listing S3 bucket is working. See this example, in which i selected data from all files in XXX bucket:
This SQL is working (after creating instance CREDENTIAL to S3 bucket):
SELECT
r.filepath()
,r.filename()
,count(*) cnt
FROM OPENROWSET(
BULK 's3://XXX.s3.eu-central-1.amazonaws.com/*.csv'
,FORMAT = 'CSV'
)
WITH (
[col1] VARCHAR(255)
,[col2] VARCHAR(255)
,[col3] VARCHAR(255)
,[col4] VARCHAR(255)
,[col5] VARCHAR(255)
) AS [r]
GROUP BY
r.filepath()
,r.filename()
but after changing bucket to folder:
BULK 's3://XXX.s3.eu-central-1.amazonaws.com/somefolder/*.csv'
SQL Server is returning error:
Msg 13807, Level 16, State 1, Line 75
Content of directory on path 's3://XXX.s3.eu-central-1.amazonaws.com/somefolder/*.csv' cannot be listed.
Found this topic https://learn.microsoft.com/en-us/answers/questions/878813/polybase-s3-not-working-on-sql-server-2022-linux where Microsoft employee is saying that "We are working on this; it has to do with wildcard expansion." and later on "Not in the roadmap, but things can always change.". It looks like it's not implemented yet, unfortunately.
Note: single files can be read from any folder in S3 bucket.
Upvotes: 1
Reputation: 2784
You don't need add '**' end of bucket path.
CREATE EXTERNAL TABLE tb_Test
(
coluna_1 INT,
coluna_2 VARCHAR(100)
)
WITH
(
LOCATION = '/testeParquet/',
DATA_SOURCE = s3_ds,
FILE_FORMAT = ParquetFileFormat
);
GO
Upvotes: 0