Tong
Tong

Reputation: 539

How to export DataFrame to csv in Scala?

How can I export Spark's DataFrame to csv file using Scala?

Upvotes: 11

Views: 56397

Answers (4)

Luiz Viola
Luiz Viola

Reputation: 2436

A method to export and rename the file:

def export_csv(  
  fileName: String,
  filePath: String
  ) = {

  val filePathDestTemp = filePath + ".dir/"
  val merstageout_df = spark.sql(merstageout)

  merstageout_df
    .coalesce(1)
    .write
    .option("header", "true")
    .mode("overwrite")
    .csv(filePathDestTemp)
  
  val listFiles = dbutils.fs.ls(filePathDestTemp)

  for(subFiles <- listFiles){
      val subFiles_name: String = subFiles.name
      if (subFiles_name.slice(subFiles_name.length() - 4,subFiles_name.length()) == ".csv") {
        dbutils.fs.cp (filePathDestTemp + subFiles_name,  filePath + fileName+ ".csv")
        dbutils.fs.rm(filePathDestTemp, recurse=true)
      }}} 

Upvotes: 0

karthik manchala
karthik manchala

Reputation: 13650

Easiest and best way to do this is to use spark-csv library. You can check the documentation in the provided link and here is the scala example of how to load and save data from/to DataFrame.

Code (Spark 1.4+):

dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")

Edit:

Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:

Merge Spark's CSV output folder to Single File

Upvotes: 16

Abu Shoeb
Abu Shoeb

Reputation: 5152

Above solution exports csv as multiple partitions. I found another solution by zero323 on this stackoverflow page that exports a dataframe into one single CSV file when you use coalesce.

df.coalesce(1)
  .write.format("com.databricks.spark.csv")
  .option("header", "true")
  .save("/your/location/mydata")

This would create a directory named mydata where you'll find a csv file that contains the results.

Upvotes: 13

Taylrl
Taylrl

Reputation: 3919

In Spark verions 2+ you can simply use the following;

df.write.csv("/your/location/data.csv")

If you want to make sure that the files are no longer partitioned then add a .coalesce(1) as follows;

df.coalesce(1).write.csv("/your/location/data.csv")

Upvotes: 15

Related Questions