Quynh-Mai Chu
Quynh-Mai Chu

Reputation: 155

Copy data from Parquet file to a table in Azure Synapse by using Polybase T-SQL failing when using a partition column

I tried to copy a parquet file to a table in Azure Synapse by using Polybase T-SQL. Here is an example:

data = [["Jean", 15, "Tennis"], ["Jane", 20, "Yoga"], ["Linda", 35, "Yoga"], ["Linda", 35, "Tennis"]]
columns = ["Name", "Age", "Sport"]
df = spark.createDataFrame(data, columns)

Then I save the dataframe as a parquet file by partitioning by the column "Sport":

df\
.write\
.option("header", True)\
.partitionBy("Sport")\
.format("parquet")\
.mode("overwrite")\
.save("/mnt/test_partition_polybase.parquet")

Then I use Polybase T-SQL for copying from a parquet file to a table in Synapse:

IF OBJECT_ID(N'EVENTSTORE.TEST') IS NOT NULL BEGIN DROP EXTERNAL TABLE EVENTSTORE.TEST END 
CREATE EXTERNAL TABLE EVENTSTORE.TEST(
    [Name] NVARCHAR(250),
    [Age] BIGINT,     
    [SPORT] NVARCHAR(250)
    )
 WITH (DATA_SOURCE = [my_data_source],LOCATION = N'test_partition_polybase.parquet',FILE_FORMAT = [SynapseParquetFormat],REJECT_TYPE = VALUE,REJECT_VALUE = 0)

I get the error:

External file access failed due to internal error: 'File test_partition_polybase.parquet/Sport=Tennis/part-00000-tid-5109481329839379912-631db0ad-cd52-4f9e-acf6-76828a8aa4eb-67-1.c000.snappy.parquet: HdfsBridge::CreateRecordReader - Unexpected error encountered creating the record reader: HadoopExecutionException: Column count mismatch. Source file has 2 columns, external table definition has 3 columns

It's because the partitioned column "Sport" is not in the partitioned file. How can I solve this issue?

Upvotes: 1

Views: 788

Answers (1)

Quynh-Mai Chu
Quynh-Mai Chu

Reputation: 155

I finally solved my problem by creating a duplicated column:

df\
.withColumn("part_Sport", F.col("Sport"))\
.write\
.option("header", True)\
.partitionBy("part_Sport")\
.format("parquet")\
.mode("overwrite")\
.save("/mnt/test_partition_polybase.parquet")

I don't know if it is the best solution but that's the only one I have found.

Upvotes: 0

Related Questions