Reputation: 8075
I have a plain text file:
2 jordyt
2 dawder
2 LOL12345
2 2251084185
2 123456789
2 123456
1 warcraft
1 tripp88
after parsing it via python's csv
module , i have:
with open(filename,'r') as csvfile:
reader = csv.reader(csvfile,delimiter=' ')
for row in reader:
print row
['', '', '', '', '', '', '2', 'jordyt']
['', '', '', '', '', '', '2', 'dawder']
['', '', '', '', '', '', '2', 'LOL12345']
['', '', '', '', '', '', '2', '2251084185']
['', '', '', '', '', '', '2', '123456789']
['', '', '', '', '', '', '2', '123456']
['', '', '', '', '', '', '1', 'warcraft']
['', '', '', '', '', '', '1', 'tripp88']
EDIT 1:
I expect the output like this:
['2', 'jordyt']
['2', 'dawder']
['2', 'LOL12345']
.
.
.
i can fix this problem with a custom pre-processor. but, those files are so big, and its not good to read them twice.
my question is: how can i tell the csv module to strip the lines before parsing it?
Upvotes: 0
Views: 2848
Reputation: 208665
One option is to provide the skipinitialspace
parameter:
with open(filename,'r') as csvfile:
reader = csv.reader(csvfile,delimiter=' ',skipinitialspace=True)
for row in reader:
print row
Upvotes: 5
Reputation: 142226
If your delimiter is a space, then I would be tempted to not use the CSV module (if you know you've not got quoted fields with spaces):
This takes advantage of the nature of split()
or split(None)
dealing with consecutive delimiters nicely.
with open(filename,'r') as csvfile:
for row in csvfile:
print row.split()
Or, if you need to deal with it and use the CSV module, just create a generator over your input file and pass that to the reader):
with open(filename,'r') as csvfile:
stripped = (row.strip() for row in csvfile)
reader = csv.reader(stripped,delimiter=' ')
for row in reader:
print row
Upvotes: 4
Reputation: 414
I question your use of csv in this case, since split() will do what you want.
with open(filename, 'r') as csvfile:
for row in csvfile:
words = row.split()
print words
prints (for your data):
['2', 'jordyt']
['2', 'dawder']
['2', 'LOL12345']
['2', '2251084185']
['2', '123456789']
['2', '123456']
['1', 'warcraft']
['1', 'tripp88']
Upvotes: 1