Clay Shirky
Clay Shirky

Reputation: 133

Python csv module splitting strings, not just fields

When I run this input (saved as variable 'line'):

xsc_i,202,"House of Night",21,"/21_202"

through a csv reader:

for row in csv.reader(line):
    print row

it splits the strings, not just the fields

['x']
['s']
['c']
['_']
['i']
['', '']
['2']
['0']
['2']
['', '']

etc.

It exhibits this behavior even if I explicitly set the delimiter:

csv.reader(line, delimiter=",")

It's treating even strings as arrays, but I can't figure out why, and I can't just split on commas because many commas are inside "" strings in the input.

Python 2.7, if it matters.

Upvotes: 2

Views: 537

Answers (3)

kotlet schabowy
kotlet schabowy

Reputation: 918

This is because csv.reader expects

any object which supports the iterator protocol and returns a string each time its next() method is called

You have passed a string to the reader.

If you say:

line = ['xsc_i,202,"House of Night",21,"/21_202"',]

Your code should work as expected. Please see docs

Upvotes: 1

vks
vks

Reputation: 67988

Just in case you want to see re in action.

import re
line='xsc_i,202,"House of Night",21,"/21_202"'
print map(lambda x:x.strip('"'),re.split(r',(?=(?:[^"]*"[^"]*")*[^"]*$)',line))

Output:['xsc_i', '202', 'House of Night', '21', '/21_202']

Upvotes: 1

alecxe
alecxe

Reputation: 474191

The first argument to csv.reader() is expected to be an iterable object containing csv rows. In your case the input is a string (which is also iterable) containing a single row. You need to enclose the line into a list:

for row in csv.reader([line]):
    print row

Demo:

>>> import csv
>>> line = 'xsc_i,202,"House of Night",21,"/21_202"'
>>> for row in csv.reader([line]):
...     print row
... 
['xsc_i', '202', 'House of Night', '21', '/21_202']

Upvotes: 6

Related Questions