Write either csv output OR parquet output, controlled via a configuration setting

Question

I would like my program to write output files either in csv or parquet format and the decision to use either of the format should be controlled via a configuration.

I could use something like this below.

// I would probably read opType via a JSON or XML.
val opType = "csv"

// Write output based on appropriate opType
optype match {
  case "csv" =>
    df.write.csv("/some/output/location")
  case "parquet" =>
    df.write.parquet("/some/output/location")
  case _ =>
    df.write.csv("/some/output/location/")
}

Question: Is there a better way to handle this scenario? Is there anyway I could use the string value of opType to call the appropriate function call whether parquet or csv ?

Any help or pointers are appreciated.

kavetiraviteja · Accepted Answer

Create Enum of possible file Types and make sure enum notations should follow spark source fileType keywords (i.e csv,parquet,orc,json,text etc)

Then you can do simply like this

df.write.format(optype).save(opPath)

Note: Enum is used only for type checking and making sure input is not some incorrect or garbled value.

Write either csv output OR parquet output, controlled via a configuration setting

Answers (1)

Related Questions