Custom file name to write dataframe in PySpark

Question

I want to write the records of dataframe. The records are in json format. So I need to write the content into the file with my custom file name not as part-0000-cfhbhgh.json.

Ram Ghadiyaram · Accepted Answer

I am giving answer in scala but in python also these are the essential steps..

 import org.apache.hadoop.fs.{FileSystem, Path}

  val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
  val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
  println("file name " + file)
  fs.rename(
    new Path("data/jsonexample/" + file)
    , new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))

Full Example :

 import spark.implicits._

  val df = Seq(
    (123, "ITA", 1475600500, 18.0),
    (123, "ITA", 1475600500, 18.0),
    (123, "ITA", 1475600516, 19.0)
  ).toDF("Value", "Country", "Timestamp", "Sum")
  df.coalesce(1)
    .write
    .mode(SaveMode.Overwrite)
    .json("data/jsonexample/")

  import org.apache.hadoop.fs.{FileSystem, Path}

  val fs: FileSystem = FileSystem.get(spark.sparkContext.hadoopConfiguration);
  val file = fs.globStatus(new Path("data/jsonexample/part*"))(0).getPath().getName()
  println("file name " + file)
  fs.rename(
    new Path("data/jsonexample/" + file)
    , new Path("data/jsonexample/tsuresh97_json_toberenamed.json"))

Result :

json content :

{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600500,"Sum":18.0}
{"Value":123,"Country":"ITA","Timestamp":1475600516,"Sum":19.0}

Custom file name to write dataframe in PySpark

Answers (1)

Related Questions