Reputation: 905
I would like to split a string on a comma, but ignore cases when it is within quotation marks:
for example:
teststring = '48, "one, two", "2011/11/03"'
teststring.split(",")
['48', ' "one', ' two"', ' "2011/11/03"']
and the output I would like is:
['48', ' "one, two"', ' "2011/11/03"']
Is this possible?
Upvotes: 17
Views: 22242
Reputation: 1099
This is not a standard module, you have to install it via pip, but as an option try tssplit:
In [5]: from tssplit import tssplit
In [6]: tssplit('48, "one, two", "2011/11/03"', quote='"', delimiter=',', trim=' ')
Out[6]: ['48', 'one, two', '2011/11/03']
Upvotes: 7
Reputation: 94
import shlex
teststring = '48, "one, two", "2011/11/03"'
output = shlex.split(teststring)
output = [re.sub(r",$","",w) for w in output]
print output
['48', 'one, two', '2011/11/03']
Upvotes: 1
Reputation: 193814
You can use the csv
module from the standard library:
>>> import csv
>>> testdata = ['48, "one, two", "2011/11/03"']
>>> testcsv = csv.reader(testdata,skipinitialspace=True)
>>> testcsv.next()
['48', 'one, two', '2011/11/03']
The one thing to watch out for is that the csv.reader
objects expect an iterator
which will return a string each time next()
is called. This means that you can't pass a string string straight to a reader()
, but you can enclose it in a list as above.
You'll have to be careful with the format of your data or tell csv
how to handle it. By default the quotes have to come immediately after the comma or the csv
module will interpret the field as beginning with a space rather than being quoted. You can fix this using the skipinitialspace
option.
Upvotes: 9
Reputation: 226694
The csv module will work if you set options to handle this dialect:
>>> import csv
>>> teststring = '48, "one, two", "2011/11/03"'
>>> for line in csv.reader([teststring], skipinitialspace=True):
print line
['48', 'one, two', '2011/11/03']
Upvotes: 31
Reputation: 40424
You can use shlex
module to parse your string.
By default, shlex.split
will split your string at whitespace characters not enclosed in quotes:
>>> shlex.split(teststring)
['48,', 'one, two,', '2011/11/03']
This doesn't removes the trailing commas from your string, but it's close to what you need. However, if you customize the parser to consider the comma as a whitespace character, then you'll get the output that you need:
>>> parser = shlex.shlex(teststring)
>>> parser.whitespace
' \t\r\n'
>>> parser.whitespace += ','
>>> list(parser)
['48', '"one, two"', '"2011/11/03"']
Note: the parser object is used as an iterator to get the tokens one by one. Hence, list(parser)
iterates over the parser object and returns the string splitted where you need.
Upvotes: 7
Reputation: 50567
You should use the Python csv library: http://docs.python.org/library/csv.html
Upvotes: 3