Reputation: 20342
I have a CSV file in my blob storage. I want to download the file. The weird things is, I can't actually see the file. The file CSV is created by a Python job and converted into a Scala dataframe.
When I run these 2 lines of code:
# convert python df to spark df and export the spark df
spark_df = spark.createDataFrame(df)
## Write Frame out as Table
spark_df.write.csv("dbfs:/rawdata/corp/AAA.csv")
I get this error:
org.apache.spark.sql.AnalysisException: path dbfs:/rawdata/corp/AAA.csv already exists.;
The weird thing is that I can't see the file when I'm using Azure Storage Explorer. Apparently the file exists, even though I can't see it. How can I download this CSV file? I would like to use Databricks, preferably, or maybe something else, if someone here can suggest a better option.
Thanks.
Upvotes: 1
Views: 2678
Reputation: 12788
Note: Using GUI, you can download full results (max 1 millions rows).
To download full results (more than 1 million), first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.
dbfs cp "dbfs:/FileStore/tables/AA.csv" "A:\AzureAnalytics"
Reference: Databricks file system
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line. For example:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Reference: Installing and configuring Azure Databricks CLI
Hope this helps.
Upvotes: 2
Reputation: 20342
I found another nice solution here.
https://docs.databricks.com/notebooks/notebooks-use.html
Just before this step, do display the contents of a dataframe, run this line of code.
display(df)
Upvotes: 1