Tom Becker
Tom Becker

Reputation: 311

How to export data from a dataframe to a file databricks

I'm doing right now Introduction to Spark course at EdX. Is there a possibility to save dataframes from Databricks on my computer.

I'm asking this question, because this course provides Databricks notebooks which probably won't work after the course.

In the notebook data is imported using command:

log_file_path = 'dbfs:/' + os.path.join('databricks-datasets', 'cs100', 'lab2', 'data-001', 'apache.access.log.PROJECT')

I found this solution but it doesn't work:

df.select('year','model').write.format('com.databricks.spark.csv').save('newcars.csv')

Upvotes: 19

Views: 95749

Answers (3)

Triamus
Triamus

Reputation: 2515

You can also save it to the file store and donwload via its handle, e.g.

df.coalesce(1).write.format("com.databricks.spark.csv").option("header", "true").save("dbfs:/FileStore/df/df.csv")

You can find the handle in the Databricks GUI by going to Data > Add Data > DBFS > FileStore > your_subdirectory > part-00000-...

Download in this case (for Databricks west europe instance)

https://westeurope.azuredatabricks.net/files/df/df.csv/part-00000-tid-437462250085757671-965891ca-ac1f-4789-85b0-akq7bc6a8780-3597-1-c000.csv

I haven't tested it but I would assume the row limit of 1 million rows that you would have when donwloading it via the mentioned answer from @MrChristine does not apply here.

Upvotes: 19

MrChristine
MrChristine

Reputation: 1551

Databricks runs a cloud VM and does not have any idea where your local machine is located. If you want to save the CSV results of a DataFrame, you can run display(df) and there's an option to download the results.

enter image description here

Upvotes: 54

yoga
yoga

Reputation: 1959

Try this.

df.write.format("com.databricks.spark.csv").save("file:///home/yphani/datacsv")

This will save the file into Unix Server.

if you give only /home/yphani/datacsv it looks for the path on HDFS.

Upvotes: 2

Related Questions