DVN
DVN

Reputation: 21

csv parsing in python with quotes in values

I'm trying to parse simple string with csv module:

s='param="(a, b)", param2, param3'
list(csv.reader([s], skipinitialspace=True))

it splits to the

[['param="(a', 'b)"', 'param2', 'param3']]

but I'd like to get

[['param="(a, b)"', 'param2', 'param3']]

It seems that for the csv module quoted text may be the whole tooken only.

How to make what I want correctly?

Note: this is not a duplicate of Splitting with commas because in this case, each field is not quoted, only a part within the field. The answer(s) posted at that link (and the link to which that question is a duplicate) do not apply in this case, as evidenced by the above code (which recreates the same structure as the posted answers, and shows that it fails).

Upvotes: 2

Views: 312

Answers (1)

DSM
DSM

Reputation: 353009

Unfortunately the csv module doesn't handle text it considers inappropriately quoted very well, or so it seems. One option would be to fall back on regex, something like

>>> s = 'param="(a, b)", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)"', 'param2', 'param3']
>>> s = 'param="(a, b)" "more quotes" "yet,more,quotes", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)" "more quotes" "yet,more,quotes"', 'param2', 'param3']

(It would be much better if you could start from a better-formatted initial string, so if you can control that it would be a much better approach.)

Upvotes: 2

Related Questions