koh-ding
koh-ding

Reputation: 135

Not able to write to CSV with header using Spark Scala

I've never had this issue before, but for some reason when I'm writing a dataframe to CSV in spark scala, the output CSV file is in completely wrong format. 1, it doesn't have any header row, and 2, there are random large blank gaps between the columns.

But the funny part is when I do df.show in the IDE, it outputs absolutely fine with the header and in proper format.

I'm using a very basic, generic write to csv method,

df.write.csv("output.csv")

Why could this be happening? Is it because of some joining and merging that I'm doing that is being distributed across clusters and not able to reformat properly before writing to CSV?

Upvotes: 0

Views: 1932

Answers (1)

maxime G
maxime G

Reputation: 1771

You are missing some option :

  • sep (default ,): sets a single character as a separator for each field and value.
  • quote (default "): sets a single character used for escaping quoted values where the separator can be part of the value. If an empty string is set, it uses u0000 (null character).
  • escape (default \): sets a single character used for escaping quotes inside an already quoted value.
  • charToEscapeQuoteEscaping (default escape or \0): sets a single character used for escaping the escape for the quote character. The default value is escape character when escape and quote characters are different, \0 otherwise.
  • escapeQuotes (default true): a flag indicating whether values containing quotes should always be enclosed in quotes. Default is to escape all values containing a quote character.
  • quoteAll (default false): a flag indicating whether all values should always be enclosed in quotes. Default is to only escape values containing a quote character.
  • header (default false): writes the names of columns as the first line.
  • nullValue (default empty string): sets the string representation of a null value.
  • compression (default null): compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).
  • dateFormat (default yyyy-MM-dd): sets the string that indicates a date format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to date type.
  • timestampFormat (default yyyy-MM-dd'T'HH:mm:ss.SSSXXX): sets the string that indicates a timestamp format. Custom date formats follow the formats at java.text.SimpleDateFormat. This applies to timestamp type.
  • ignoreLeadingWhiteSpace (default true)`: a flag indicating whether or not leading whitespaces from values being written should be skipped.
  • ignoreTrailingWhiteSpace (default true): a flag indicating defines whether or not trailing whitespaces from values being written should be skipped.

In you case :

df.write.option("header","true").csv("output.csv")

Upvotes: 3

Related Questions