Copying files from databricks to blob storage results in files of 0 size

I am trying to copy a file from databricks to a location in blob storage using the below command:

dbutils.fs.cp('dbfs:/FileStore/tables/data/conv_subset_april_2018.csv',"wasb://[email protected]/" + "conv_subset_april_2018" + ".csv")

Now blobname and outputcontainername are correct and I have copied files earlier to the storage location. Only today when I am executing the command I am getting files of size 0. Also the file data.csv does exist in the given location and is not empty or corrupted. Does anyone have any idea what might be happening? The screenshot shows what I mean.

Files of size 0

Upvotes: 3

Views: 15966

Answers (1)

Peter Pan
Peter Pan

Reputation: 24148

As I known, there are two ways to copy a file from Azure Databricks to Azure Blob Storage. Please refer to the offical document Azure Blob Storage of topic Data Sources of Azure Databricks to know more details.

Here is my sample codes below.

  1. To mount a container of Azure Blob Storage to Azure Databricks as a dbfs path, the you can cp your file in a databricks path to the mounted path of Blob Storage. Please refer to Mount Azure Blob Storage containers with DBFS.

    dbutils.fs.mount(
        source = "wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net",
        mount_point = "/mnt/<mount-name>",
        extra_configs = {"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net":"<your-storage-account-key>"})
    dbutils.fs.cp('dbfs:/FileStore/tables/data/conv_subset_april_2018.csv','dbfs:/mnt/<mount-name>/conv_subset_april_2018.csv')
    # Or dbutils.fs.cp('/FileStore/tables/data/conv_subset_april_2018.csv','/mnt/<mount-name>/conv_subset_april_2018.csv')
    
  2. To set up an account access key or to set up a SAS for a container, then to copy a file from a dbfs file path to a wasbs file path.

    spark.conf.set(
        "fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
        "<your-storage-account-access-key>")
    # Or 
    # spark.conf.set(
    #     "fs.azure.sas.<your-container-name>.<your-storage-account-name>.blob.core.windows.net",
    #     "<complete-query-string-of-your-sas-for-the-container>")
    dbutils.fs.cp('/FileStore/tables/data/conv_subset_april_2018.csv','wasbs://[email protected]//conv_subset_april_2018.csv')
    

Hope it helps.

Upvotes: 8

Related Questions