Jason Strimpel
Jason Strimpel

Reputation: 15476

Split string on comma when field contains a comma

Consider the following string:

'538.48,0.29,"533.59 - 540.00","AZO",102482,"+0.05%","N/A",0.00,535.09,"AutoZone, Inc. Co",538.77,"N/A"'

I need to split this into a list so it looks like the following:

[538.48, 0.29, "533.59 - 540.00", "AZO", 102482, "+0.05%" , "N/A", 0.00, 535.09, "AutoZone, Inc. Co", 538.77, "N/A"]

The problem is I can't use list.split(',') because the 10th field has a comma within it. The question is then how best to split the original string into a list when arbitrary fields may have a comma?

Upvotes: 1

Views: 739

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122242

Use the csv module rather than attempt to split this yourself, it handles quoted values, including quoted values containing the delimiter, out of the box:

>>> import csv
>>> from pprint import pprint
>>> data = '538.48,0.29,"533.59 - 540.00","AZO",102482,"+0.05%","N/A",0.00,535.09,"AutoZone, Inc. Co",538.77,"N/A"'
>>> reader = csv.reader(data.splitlines())
>>> pprint(next(reader))
['538.48',
 '0.29',
 '533.59 - 540.00',
 'AZO',
 '102482',
 '+0.05%',
 'N/A',
 '0.00',
 '535.09',
 'AutoZone, Inc. Co',
 '538.77',
 'N/A']

Note the 'AutoZone, Inc. Co' column value.

If you are reading this data from a file, pass in the file object to the csv.reader() object directly rather than hand it sequences of strings.

You can even have the numeric values (anything not quoted) interpreted as floating point values, by setting quoting=csv.QUOTE_NONNUMERIC:

>>> reader = csv.reader(data.splitlines(), quoting=csv.QUOTE_NONNUMERIC)
>>> pprint(next(reader))
[538.48,
 0.29,
 '533.59 - 540.00',
 'AZO',
 102482.0,
 '+0.05%',
 'N/A',
 0.0,
 535.09,
 'AutoZone, Inc. Co',
 538.77,
 'N/A']

Upvotes: 2

Related Questions