Sathya
Sathya

Reputation: 3

External Table error with a column value containing comma

I am using Azure synapse to create external table.

The ADLS contains the csv file and the column value has a comma . For example, 123,456.

I tried creating the external table with column datatype VARCHAR, NVARCHAR, FLOAT, DECIMAL. The table is getting created but while selecting using select * from, I am getting Hadoop exception error. Is there a way to select data in the external table with such values.

Error:

"message": "Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Could not find a delimiter after string delimiter.',

It's dedicated sql pool in synapse.

Thanks Sathya

Upvotes: 0

Views: 219

Answers (1)

As you mentioned you are using the CSV as the file format to create external table in dedicated sql pool.

As per Use external tables with Synapse SQL

enter image description here

For the external table creation in the dedicated Sql pool you will need to use Parquet format.

You can create external tables in Synapse SQL pools via the following steps:

CREATE EXTERNAL DATA SOURCE:

CREATE EXTERNAL DATA SOURCE <data_source_name>
WITH
( [ LOCATION = '<prefix>://<path>[:<port>]' ]
  [ [ , ] CREDENTIAL = <credential_name> ]
  [ [ , ] TYPE = HADOOP ]
)
[ ; ]

CREATE EXTERNAL FILE FORMAT:

CREATE EXTERNAL FILE FORMAT file_format_name
WITH (
         FORMAT_TYPE = PARQUET
     [ , DATA_COMPRESSION = {
        'org.apache.hadoop.io.compress.SnappyCodec'
      | 'org.apache.hadoop.io.compress.GzipCodec' }
    ]);
    
    
    CREATE EXTERNAL FILE FORMAT census_file_format
WITH
(  
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
)

CREATE EXTERNAL TABLE:

CREATE EXTERNAL TABLE census_external_table
(
    decennialTime varchar(20),
    stateName varchar(100),
    countyName varchar(100),
    population int,
    race varchar(50),
    sex    varchar(10),
    minAge int,
    maxAge int
)  
WITH (
    LOCATION = '/parquet/',
    DATA_SOURCE = population_ds,  
    FILE_FORMAT = census_file_format
)
GO

SELECT TOP 1 * FROM census_external_table

Reference: Use external tables with Synapse SQL

As you mentioned CSV file from the source through Azure synapse pipeline into ADLS Gen2 storage.

The below are the steps to take the comma within the value into account while creating external tables:

The CSV file in the ADLS is having a column like below.

Col1
123,456

Right click on the CSV file and you will see below options

enter image description here

When you try to create the EXT table you will be able to make changes for the Field Terminator, String delimiter. enter image description here Next you can create the schema. TableName

Here is the script for the external table creation:

IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseDelimitedTextFormat') 
    CREATE EXTERNAL FILE FORMAT [SynapseDelimitedTextFormat] 
    WITH ( FORMAT_TYPE = DELIMITEDTEXT ,
           FORMAT_OPTIONS (
             FIELD_TERMINATOR = ',',
             STRING_DELIMITER = '"',
             FIRST_ROW = 2,
             USE_TYPE_DEFAULT = FALSE
            ))
GO
IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'folder02_dileepsynpstg_dfs_core_windows_net') 
    CREATE EXTERNAL DATA SOURCE [folder02_dileepsynpstg_dfs_core_windows_net] 
    WITH (
        LOCATION = 'abfss://[email protected]' 
    )
GO
CREATE EXTERNAL TABLE dbo.tb1 (
    [Col1] nvarchar(4000)
    )
    WITH (
    LOCATION = 'GK.csv',
    DATA_SOURCE = [folder02_dileepsynpstg_dfs_core_windows_net],
    FILE_FORMAT = [SynapseDelimitedTextFormat]
    )
GO
SELECT TOP 100 * FROM dbo.tb1
GO

Results:

Col1
123,456

Upvotes: 0

Related Questions