ASH
ASH

Reputation: 20342

How can I download a file from blob storage

I have a CSV file in my blob storage. I want to download the file. The weird things is, I can't actually see the file. The file CSV is created by a Python job and converted into a Scala dataframe.

When I run these 2 lines of code:

# convert python df to spark df and export the spark df
spark_df = spark.createDataFrame(df)
## Write Frame out as Table
spark_df.write.csv("dbfs:/rawdata/corp/AAA.csv")

I get this error:

org.apache.spark.sql.AnalysisException: path dbfs:/rawdata/corp/AAA.csv already exists.;

The weird thing is that I can't see the file when I'm using Azure Storage Explorer. Apparently the file exists, even though I can't see it. How can I download this CSV file? I would like to use Databricks, preferably, or maybe something else, if someone here can suggest a better option.

Thanks.

Upvotes: 1

Views: 2678

Answers (3)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12788

Note: Using GUI, you can download full results (max 1 millions rows).

enter image description here

To download full results (more than 1 million), first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.

dbfs cp "dbfs:/FileStore/tables/AA.csv" "A:\AzureAnalytics"

Reference: Databricks file system

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

Reference: Installing and configuring Azure Databricks CLI

Hope this helps.

Upvotes: 2

ASH
ASH

Reputation: 20342

I found another nice solution here.

https://docs.databricks.com/notebooks/notebooks-use.html

enter image description here

Just before this step, do display the contents of a dataframe, run this line of code.

display(df)

Upvotes: 1

kgalic
kgalic

Reputation: 2664

How about using the blob storage sdk and the following command

# Download as a file
block_blob_service.get_blob_to_path(container_name, blob_name, local_file_name)

Upvotes: 2

Related Questions