HimanshuSPaul
HimanshuSPaul

Reputation: 316

Facing Issue in writing PySpark Dataframe to Azure Synapse

I have a PySpark dataframe in Azure Databricks. I want to write into Azure Synapse. But i am getting below error.

com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column 'ETL_TableName' has a data type that cannot participate in a columnstore index. 

I checked connection for Synapse .All works fine and i am able to read the data. But While writing , i am getting issue . Could anyone please help how to handle this error.

Code For Writing data into Synapse:

dataFrame.repartition(1).write.format("jdbc")\
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Upvotes: 4

Views: 3421

Answers (2)

HimanshuSPaul
HimanshuSPaul

Reputation: 316

This is not be the exact answer of above issue . But it may help someone to get over . "I still have no clue about reason behind above issue" . But i noticed this issue is coming when trying to write into Azure Synapse Warehouse. As i don't have any strict reason for sticking to Synapse Warehouse and whole priority was to write data from Databricks to Azure in structured format , i replace Azure Synapse warehouse with Azure SQL Server database .And its working much better.Will update the answer once i found the actual reason behind the issue.

Upvotes: 0

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12788

Couple of things needs to be changed.

Format should be .format("jdbc") => .format("com.databricks.spark.sqldw").

Add this option "tableOptions" clause to your write statement. It takes the place of the with() clause of the CREATE TABLE (AS) statement:

.option ("tableOptions","heap,distribution=MY_DISTRIBUTION")

Code should looks like this:

dataFrame.repartition(1).write.format("com.databricks.spark.sqldw")\
         .option("tableOptions","heap,distribution=HASH(rownum)")
         .option("url", azureurl)\
         .option("tempDir", tempDir) \
         .option("forwardSparkAzureStorageCredentials", "true") \
         .option("dbTable", dbTable)\
         .option("append", "true")\
         .save()

Reference:

Azure Databricks - Azure Synapse Analytics

Choose a value for MY_DISTRIBUTION based on the following guidance:

Guidance for designing distributed tables in Synapse SQL pool

Upvotes: 1

Related Questions