Reputation: 748
When working with a Spark notebook for Azure Synapse Analytics (ASA), I can use Scala to save a CSV file as a table in a Dedicated SQL Pool with two simple statements:
%%spark
// [1] - https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html
// [2] - https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri
val testDF = spark.read.format("csv").option("header", "true").load(
"abfss://[email protected]/TIOBE-azure-backup.csv"
);
// [3] - https://www.aizoo.info/post/dropping-a-sql-table-in-your-synapse-spark-notebooks
// [4] - https://stackoverflow.com/questions/67907984/write-data-to-sql-dw-from-apache-spark-in-azure-synapse
testDF.write.mode("overwrite").synapsesql("eary_dedicated_test_sql_pool.dbo.TIOBE_test");
Unfortunately, [3] from above seems to imply the synapsesql function does not exist for PySpark. Has this situation changed since Darren last updated his post?
NOTE: I do not want to mess with configurating the Azure Synapse Dedicated SQL Pool Connector for Apache Spark. My dedicated SQL pool is in the same workspace as my Spark Pool so I don't feel like I should have to hassle with configuring a bunch of authentication options:
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export?tabs=scala%2Cscala1%2Cscala2%2Cscala3%2Cscala4%2Cscala5
EDIT: The following pyspark code gives me
"AttributeError: 'DataFrameWriter' object has no attribute 'synapsesql'
"
on Line 7
%%pyspark
df = spark.read.load('abfss://[email protected]/TIOBE-azure-backup.csv', format='csv'
## If header exists uncomment line below
, header=True
)
# [5] - https://stackoverflow.com/questions/69720753/write-dataframe-to-sql-dedicated-database-using-synapse-analytics
df.write.mode("overwrite").synapsesql("eary_dedicated_test_sql_pool.dbo.TIOBE_test")
Upvotes: 1
Views: 722
Reputation: 66
Python support for synapsesql
has been here for about a year now. Just add the imports as per the docs:
# Add required imports
import com.microsoft.spark.sqlanalytics
from com.microsoft.spark.sqlanalytics.Constants import Constants
from pyspark.sql.functions import col
# Get the table with synapsesql method and expose as temp view
df = spark.read.synapsesql("sandpit_ded.dbo.nation")
df.createOrReplaceTempView("vw_nation")
Python Notebook using synapsesql method
Upvotes: 3