Pyspark : Reading csv files with fields having double quotes and comas

Question

I have a csv file which I am reading thru pyspark and loading into postgresql. One of its field is having strings which have coma and double quotes within the string. Like example below -

1. "RACER ""K"", P.L. 9"
2. "JENIS, B. S. ""N"" JENIS, F. T. ""B"" 5"

Pyspark is parsing it as below. Which is causing issue because it is mixing up the values/columns when I load the data into postgresql and script fail.

1. '\"RACER \"\"K\"\"'
2. '\"JENIS, B. S. \"\"N\"\" JENIS'

I am using spark 2.42. How can this situation be handled in pyspark? Basically I want to program to ignore coma or double quotes if it is coming inside double quotes.

Pyspark : Reading csv files with fields having double quotes and comas

Answers (1)

Related Questions