Reputation: 3271
Problem: While writing the dataframe as csv, I do not want to escape quotes. However, setting escapeQuotes=False
doesn't seem to be working.
Mentioned below is an example case:
DataPrep:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql import SparkSession, functions as func
spark = SparkSession.builder.appName("test").getOrCreate()
data = [("James", "Smith"),
("Michael", "Rose"),
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("lastname",StringType(),True)
])
df = spark.createDataFrame(data=data,schema=schema)
df.show(truncate=False)
Output:
+---------+--------+
|firstname|lastname|
+---------+--------+
|James |Smith |
|Michael |Rose |
+---------+--------+
Adding column with Newline Character
def create_column_with_newline(elem):
return f'"{elem["firstname"]}\n{elem["lastname"]}"'
columnWithNewlineUDF = func.udf(create_column_with_newline)
df = df.withColumn('newline_col', columnWithNewlineUDF(func.struct('firstname', 'lastname')))
df.show()
Output:
+---------+--------+-----------------+
|firstname|lastname| newline_col|
+---------+--------+-----------------+
| James| Smith| "James
Smith"|
| Michael| Rose| "Michael
Rose"|
+---------+--------+-----------------+
Writing csv with escapeQuotes=False
df.coalesce(1).write.csv('test.tsv', mode='overwrite', sep='\t', header=True, encoding='UTF-8', escapeQuotes=False)
Output:
firstname lastname newline_col
James Smith "\"James
Smith\""
Michael Rose "\"Michael
Rose\""
As you can see, the newline_col
is written with escaped quotes :-(
Expected Output:
firstname lastname newline_col
James Smith "James
Smith"
Michael Rose "Michael
Rose"
Upvotes: 2
Views: 1435
Reputation: 137
setting these two options worked for me
.option("quote","")
.option("escapeQuotes",false)
Upvotes: 1
Reputation: 4069
Just remove the quotes from UDF:
def create_column_with_newline(elem):
# f'"{elem["firstname"]}\n{elem["lastname"]}"'
return f'{elem["firstname"]}\n{elem["lastname"]}'
Output:
firstname lastname newline_col
James Smith "James
Smith"
Michael Rose "Michael
Rose"
Excel visualization:
Upvotes: 1