Reputation: 539
How can I export Spark's DataFrame to csv file using Scala?
Upvotes: 11
Views: 56397
Reputation: 2436
A method to export and rename the file:
def export_csv(
fileName: String,
filePath: String
) = {
val filePathDestTemp = filePath + ".dir/"
val merstageout_df = spark.sql(merstageout)
merstageout_df
.coalesce(1)
.write
.option("header", "true")
.mode("overwrite")
.csv(filePathDestTemp)
val listFiles = dbutils.fs.ls(filePathDestTemp)
for(subFiles <- listFiles){
val subFiles_name: String = subFiles.name
if (subFiles_name.slice(subFiles_name.length() - 4,subFiles_name.length()) == ".csv") {
dbutils.fs.cp (filePathDestTemp + subFiles_name, filePath + fileName+ ".csv")
dbutils.fs.rm(filePathDestTemp, recurse=true)
}}}
Upvotes: 0
Reputation: 13650
Easiest and best way to do this is to use spark-csv
library. You can check the documentation in the provided link and here
is the scala example of how to load and save data from/to DataFrame.
Code (Spark 1.4+):
dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")
Edit:
Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:
Merge Spark's CSV output folder to Single File
Upvotes: 16
Reputation: 5152
Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce
.
df.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/your/location/mydata")
This would create a directory named mydata
where you'll find a csv
file that contains the results.
Upvotes: 13
Reputation: 3919
In Spark verions 2+ you can simply use the following;
df.write.csv("/your/location/data.csv")
If you want to make sure that the files are no longer partitioned then add a .coalesce(1)
as follows;
df.coalesce(1).write.csv("/your/location/data.csv")
Upvotes: 15