FelipePerezR
FelipePerezR

Reputation: 175

Write DataFrame from Databricks to Data Lake

It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.

To mount the data I used the following:

configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
       "dfs.adls.oauth2.client.id": "<your-service-client-id>",
       "dfs.adls.oauth2.credential": "<your-service-credentials>",
       "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}

dbutils.fs.mount(source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>",extra_configs = configs)

I want to write back a .csv file. For this task I am using the following line

dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")

However, I get the following error:

IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'

Any piece of code that can help me? Or link that walks me through.

Thanks.

Upvotes: 2

Views: 10931

Answers (1)

Hauke Mallow
Hauke Mallow

Reputation: 3202

If you mount Azure Data Lake Store, you should use the mountpoint to store your data, instead of "adl://...". For details how to mount Azure Data Lake Store (ADLS ) Gen1 see the Azure Databricks documentation. You can verify if the mountpoint works with:

dbutils.fs.ls("/mnt/<newmountpoint>")

So try after mounting ADLS Gen 1:

dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("mnt/<mount-name>/<your-directory-name>")

This should work if you added the mountpoint properly and you have also the access rights with the Service Principal on the ADLS.

Spark writes always multiple files in a directory, because each partition is saved individually. See also the following stackoverflow question.

Upvotes: 3

Related Questions