Nigini
Nigini

Reputation: 484

How to ignore a CSV delimiter within a quoted field?

I have a CSV data dump that is formated like this:

"field1";"{"JSON-KEY": "JSON-VALUE"}";"field3"

If I use the CSV Python reader like this...

csv.reader(csvfile, delimiter=';', quotechar='"')

I'm having two problems:

(1) When a JSON-VALUE string contains the delimiter character ';' the Reader considers it as a delimiter and break the VALUE in two fields.

(2) When (1) is not a problem, the JSON-VALUE field is misinterpreted to have one quote less at the begining and one more at the end. For instance:

 ['field1','{JSON-KEY": "JSON-VALUE"}"','field3']

These two problems are probably related, but I can't fix this by using Python documentation and other Questions here. Does anyone have a lead to what am I missing here and how can I configure the Reader to handle this?

Upvotes: 1

Views: 1731

Answers (1)

Danny_ds
Danny_ds

Reputation: 11406

Actually, the csv data is invalid. The quotes should be escaped like this:

"field1";"{""JSON-KEY"": ""JSON-VALUE""}";"field3"

If you have no control over the generation of the csv data, you could try to use quotechar='' and then trim the quotes from the fields.

If there are ; in the json data however, that would be problematic.

Another option then would be to manually read the first and last field, and consider the data between those as the json data.

Upvotes: 2

Related Questions