Polybase query from external table not honoring reject options

Question

We are running SQL 2019 with CU 12 with an external data source that points to ADLS Gen2 storage account. We have two parquet files in the same directory where one file has 2 columns and the other file has 3 columns. We purposely did this to test the reject options knowing that our schemas will change over time.

/employee/file1.csv (2 columns/5 rows)

/employee/file2.csv (3 columns/5 rows)

Based on the documentation for reject options, we should be able to query across the external table and return non-dirty rows in the result set if reject rows fall within the reject configuration which is listed below.

https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-ver15&tabs=dedicated

CREATE EXTERNAL TABLE [dbo].[Employee] (  
      [FirstName] varchar(100) NOT NULL,
      [LastName] varchar(100) NOT NULL
)  
WITH (LOCATION='/employee/',
      DATA_SOURCE = DATA_LAKE,  
      FILE_FORMAT = ParquetFileFormat,
      REJECT_TYPE = VALUE,
      REJECT_VALUE = 1000000
);

When we select from the external table, I would expect to have it return the 5 rows from the one file that has 2 columns and reject the 5 rows from the file that contains 3 columns. Instead, we get no rows at all with the following exception.

Unexpected error encountered creating the record reader. HadoopExecutionException: Column count mismatch. Source file has 3 columns, external table definition has 2 columns.

I feel like I must be missing something or my understanding of how reject options support file schema differences is incorrect. Can anyone shed any light on this?

Polybase query from external table not honoring reject options

Answers (1)

Related Questions