pylover
pylover

Reputation: 8075

python csv module strip lines before parsing

I have a plain text file:

    2 jordyt
    2 dawder
    2 LOL12345
    2 2251084185
    2 123456789
    2 123456
    1 warcraft
    1 tripp88

after parsing it via python's csv module , i have:

with open(filename,'r') as csvfile:
    reader = csv.reader(csvfile,delimiter=' ')
    for row in reader:
        print row

['', '', '', '', '', '', '2', 'jordyt']
['', '', '', '', '', '', '2', 'dawder']
['', '', '', '', '', '', '2', 'LOL12345']
['', '', '', '', '', '', '2', '2251084185']
['', '', '', '', '', '', '2', '123456789']
['', '', '', '', '', '', '2', '123456']
['', '', '', '', '', '', '1', 'warcraft']
['', '', '', '', '', '', '1', 'tripp88']

EDIT 1:

I expect the output like this:

['2', 'jordyt']
['2', 'dawder']
['2', 'LOL12345']
.
.
.

i can fix this problem with a custom pre-processor. but, those files are so big, and its not good to read them twice.

my question is: how can i tell the csv module to strip the lines before parsing it?

Upvotes: 0

Views: 2848

Answers (3)

Andrew Clark
Andrew Clark

Reputation: 208665

One option is to provide the skipinitialspace parameter:

with open(filename,'r') as csvfile:
    reader = csv.reader(csvfile,delimiter=' ',skipinitialspace=True)
    for row in reader:
        print row

Upvotes: 5

Jon Clements
Jon Clements

Reputation: 142226

If your delimiter is a space, then I would be tempted to not use the CSV module (if you know you've not got quoted fields with spaces):

This takes advantage of the nature of split() or split(None) dealing with consecutive delimiters nicely.

with open(filename,'r') as csvfile:
    for row in csvfile:
        print row.split()

Or, if you need to deal with it and use the CSV module, just create a generator over your input file and pass that to the reader):

with open(filename,'r') as csvfile:
    stripped = (row.strip() for row in csvfile)
    reader = csv.reader(stripped,delimiter=' ')
    for row in reader:
        print row

Upvotes: 4

Lee-Man
Lee-Man

Reputation: 414

I question your use of csv in this case, since split() will do what you want.

with open(filename, 'r') as csvfile:
    for row in csvfile:
        words = row.split()
        print words

prints (for your data):

['2', 'jordyt']
['2', 'dawder']
['2', 'LOL12345']
['2', '2251084185']
['2', '123456789']
['2', '123456']
['1', 'warcraft']
['1', 'tripp88']

Upvotes: 1

Related Questions