matf
matf

Reputation: 182

How to handle double quotes inside field values with csv module?

I'm trying to parse CSV files from an external system which I have no control of.

Example CSV:

qw""erty,"a""b""c""d,ef""""g"

Should be parsed as:

[['qw"erty', 'a"b"c"d,ef""g']]

However, I think that Python's csv module does not expect quote characters to be escaped when cell was not wrapped in quote chars in the first place. csv.reader(my_file) (with default doublequote=True) returns:

['qw""erty', 'a"b"c"d,ef""g']

Is there any way to parse this with python csv module ?

Upvotes: 7

Views: 16924

Answers (2)

matf
matf

Reputation: 182

Following on @JackManey comment where he suggested to replace all instances of '""' inside of double quotes with '\\"'.

Recognizing if we are currently inside of double quoted cells turned out to be unnecessary and we can replace all instances of '""' with '\\"'. Python documentation says:

On reading, the escapechar removes any special meaning from the following character

However this would still break in the case where original cell already contains escape characters, example: 'qw\\\\""erty' producing [['qw\\"erty']]. So we have to escape the escape characters before parsing too.

Final solution:

with open(file_path, 'rb') as f:
  content = f.read().replace('\\', '\\\\').replace('""', '\\"')
  reader = csv.reader(StringIO(content), doublequote=False, escapechar='\\')
  return [row for row in reader]

Upvotes: 5

Haleemur Ali
Haleemur Ali

Reputation: 28233

as @JackManey suggests, after reading the file, you can replace the two-double-quotes with a single-double-quote.

my_file_onequote = [col.replace('""', '"') for col in row for row in my_file]

Upvotes: 0

Related Questions