How to remove double quotes from column name while saving dataframe in csv in spark?

Question

I am saving spark dataframe into csv file. All the records is saving in double quotes that is fine but column name also coming in double quotes. Could you please help me how to remove them?

Example:

"Source_System"|"Date"|"Market_Volume"|"Volume_Units"|"Market_Value"|"Value_Currency"|"Sales_Channel"|"Competitor_Name"
"IMS"|"20080628"|"183.0"|"16470.0"|"165653.256349"|"AUD"|"AUSTRALIA HOSPITAL"|"PFIZER"

Desirable Output:

Source_System|Date|Market_Volume|Volume_Units|Market_Value|Value_Currency|Sales_Channel|Competitor_Name
"IMS"|"20080628"|"183.0"|"16470.0"|"165653.256349"|"AUD"|"AUSTRALIA HOSPITAL"|"PFIZER"

I am using below code:

df4.repartition(1).write.csv(Output_Path_ASPAC, quote='"', header=True, quoteAll=True, sep='|', mode='overwrite')

notNull · Accepted Answer

I think only workaround would be concat quotes to the column values in dataframe before writing to csv.

Example:

df.show()
#+---+----+------+
#| id|name|salary|
#+---+----+------+
#|  1|   a|   100|
#+---+----+------+

from pyspark.sql.functions import col, concat, lit

cols = [concat(lit('"'), col(i), lit('"')).alias(i) for i in df.columns]
df1=df.select(*cols)

df1.show()
#+---+----+------+
#| id|name|salary|
#+---+----+------+
#|"1"| "a"| "100"|
#+---+----+------+

df1.\
write.\
csv("", header=True, sep='|',escape='', quote='',mode='overwrite')

#output
#cat tmp4/part*
#id|name|salary
#"1"|"a"|"100"

How to remove double quotes from column name while saving dataframe in csv in spark?

Answers (1)

Related Questions