Reputation: 21
I'm trying to parse simple string with csv module:
s='param="(a, b)", param2, param3'
list(csv.reader([s], skipinitialspace=True))
it splits to the
[['param="(a', 'b)"', 'param2', 'param3']]
but I'd like to get
[['param="(a, b)"', 'param2', 'param3']]
It seems that for the csv module quoted text may be the whole tooken only.
How to make what I want correctly?
Note: this is not a duplicate of Splitting with commas because in this case, each field is not quoted, only a part within the field. The answer(s) posted at that link (and the link to which that question is a duplicate) do not apply in this case, as evidenced by the above code (which recreates the same structure as the posted answers, and shows that it fails).
Upvotes: 2
Views: 312
Reputation: 353009
Unfortunately the csv module doesn't handle text it considers inappropriately quoted very well, or so it seems. One option would be to fall back on regex, something like
>>> s = 'param="(a, b)", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)"', 'param2', 'param3']
>>> s = 'param="(a, b)" "more quotes" "yet,more,quotes", param2, param3'
>>> re.findall(r'\s*((?:[^,\"]|\"[^\"]*\")+)\s*', s)
['param="(a, b)" "more quotes" "yet,more,quotes"', 'param2', 'param3']
(It would be much better if you could start from a better-formatted initial string, so if you can control that it would be a much better approach.)
Upvotes: 2