Reputation: 371
I am trying to read a dataframe from Azure Synapse DWH pool using the tutorial provided https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html
I have set the storage account access key "fs.azure.account.key..blob.core.windows.net" and also specified the temp directory for ADLS in abfss format.
The read operation is of the syntax:
x=spark.read.format('com.databricks.spark.sqldw').option('url',sqlDwUrl).option('tempDir',tempdir).option('forwardSparkAzureStorageCredentials', 'true').option('query',"SELECT TOP(1)* FROM "+ targetSchema + '.' + targetTable).load()
The above executes fine.
And then I try to display the dataframe using
display(x)
,I run into the following error.
SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
- com.microsoft.sqlserver.jdbc.SQLServerException: Please create a master key in the database or open the master key in the session before performing this operation. [ErrorCode = 15581] [SQLState = S0006]
From the documentation ,I understand that a database master key is required, and that has been duly created. Therefore, I am not sure why this error is thrown.
Surprisingly, write operations to Synapse using the format(df.write .format("com.databricks.spark.sqldw").....) work like a charm.
I did some research,and based on that,I feel that database master key(this was created by DBA) is valid for both read as well as write operations. Is there any way by which a database master key would restrict read operations, but not write? If not, then why could the above issue be occuring?
Upvotes: 2
Views: 2448
Reputation: 5044
You would have to create a MASTER KEY first after creating a SQL POOL from Azure Portal. You can do this by connecting through SSMS and running a T-SQL command. If you now try to read from a table in this pool, you would see no error in databricks.
Going through these docs, Required Azure Synapse permissions for PolyBase
As a prerequisite for the first command, the connector expects that a database master key already exists for the specified Azure Synapse instance. If not, you can create a key using the CREATE MASTER KEY command.
Next..
Is there any way by which a database master key would restrict read operations, but not write? If not, then why could the above issue be occuring?
If you notice, while writing to SQL, you have configured temp directory in the storage account. Azure Synapse connector automatically discovers the account access key set and forwards it to the connected Azure Synapse instance by creating a temporary Azure database scoped credential
Creates a database credential. A database credential is not mapped to a server login or database user. The credential is used by the database to access to the external location anytime the database is performing an operation that requires access.
And from here Open the Database Master Key of the current database
If the database master key was encrypted with the service master key, it will be automatically opened when it is needed for decryption or encryption. In this case, it is not necessary to use the OPEN MASTER KEY statement.
When a database is first attached or restored to a new instance of SQL Server, a copy of the database master key (encrypted by the service master key) is not yet stored in the server. You must use the OPEN MASTER KEY statement to decrypt the database master key (DMK). Once the DMK has been decrypted, you have the option of enabling automatic decryption in the future by using the ALTER MASTER KEY REGENERATE statement to provision the server with a copy of the DMK, encrypted with the service master key (SMK).
But...from
For SQL Database and Azure Synapse Analytics, the password protection is not considered to be a safety mechanism to prevent a data loss scenario in situations where the database may be moved from one server to another, as the Service Master Key protection on the Master Key is managed by Microsoft Azure platform. Therefore, the Master Key password is optional in SQL Database and Azure Synapse Analytics.
As you can read from above, I tried to repro and yes, after you first create a SQL POOL from Synapse Portal, you can write to a table from databricks directly but when you try to read the same you get the exception.
Spark is writing the data to the common blob storage as parquet file and later synapse uses COPY statement to load these to given table. And when reading data from synapse dedicated SQL pool table, Synapse is writing the data from dedicated sql pool to common blob storage as parquet file with snappy compression and then this is read by Spark and displayed to you.
We are just setting the blob storage account key and secret in the config for the session. And using forwardSparkAzureStorageCredentials
= true
Synapse connector is forwarding storage access key to Azure synapse dedicated pool by creating Azure database scoped credential.
Note: You can
.load()
into data frame without exception but when you try and usedisplay(dataframe)
the exception pops.
Now considering if MASTER KEY exists, connecting to your sql pool db, you can try the below,
Examples: Azure Synapse Analytics
OPEN MASTER KEY DECRYPTION BY PASSWORD = 'Your-DB-PASS';
GO
CLOSE MASTER KEY;
GO
If you get this error:
Please create a master key in the database or open the master key in the session before performing this operation.
Just CREATE MASTER KEY or use ALTER MASTER KEY:
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'ljlLKJjs$2@l23je'
OR
ALTER MASTER KEY REGENERATE WITH ENCRYPTION BY PASSWORD = 'ljlLKJjs$2@l23je';
Upvotes: 4