Reputation: 316
I have a PySpark dataframe in Azure Databricks. I want to write into Azure Synapse. But i am getting below error.
com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column 'ETL_TableName' has a data type that cannot participate in a columnstore index.
I checked connection for Synapse .All works fine and i am able to read the data. But While writing , i am getting issue . Could anyone please help how to handle this error.
dataFrame.repartition(1).write.format("jdbc")\
.option("url", azureurl)\
.option("tempDir", tempDir) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", dbTable)\
.option("append", "true")\
.save()
Upvotes: 4
Views: 3421
Reputation: 316
This is not be the exact answer of above issue . But it may help someone to get over . "I still have no clue about reason behind above issue" . But i noticed this issue is coming when trying to write into Azure Synapse Warehouse. As i don't have any strict reason for sticking to Synapse Warehouse and whole priority was to write data from Databricks to Azure in structured format , i replace Azure Synapse warehouse with Azure SQL Server database .And its working much better.Will update the answer once i found the actual reason behind the issue.
Upvotes: 0
Reputation: 12788
Couple of things needs to be changed.
Format should be .format("jdbc")
=> .format("com.databricks.spark.sqldw")
.
Add this option "tableOptions" clause to your write statement. It takes the place of the with() clause of the CREATE TABLE (AS) statement:
.option ("tableOptions","heap,distribution=MY_DISTRIBUTION")
Code should looks like this:
dataFrame.repartition(1).write.format("com.databricks.spark.sqldw")\
.option("tableOptions","heap,distribution=HASH(rownum)")
.option("url", azureurl)\
.option("tempDir", tempDir) \
.option("forwardSparkAzureStorageCredentials", "true") \
.option("dbTable", dbTable)\
.option("append", "true")\
.save()
Reference:
Azure Databricks - Azure Synapse Analytics
Choose a value for MY_DISTRIBUTION based on the following guidance:
Guidance for designing distributed tables in Synapse SQL pool
Upvotes: 1