Reputation: 3236
A few weeks ago I wrote a CSV parser in python and it was working great with the provided text file. But when we tried to test is with other files the problems started.
First was the
ValueError: empty string for float()
for a string like "313.44". The problem was that in unicode there was some empty bytes betwee the numbers '\x0'.
Ok I decoded to read it as an unicode with
codecs.open(filename, 'r', 'utf-16')
And then the hell opened, missing BOM, problems with the line end characters (LF vs CR+LF) etc.
So can you provide me or give me hint for a workaround about parsing unicode and non-unicode files if I do not know what the encoding is, is BOM present, what line ending are etc.
P.S. I am using Python 2.7
Upvotes: 1
Views: 1614
Reputation: 3236
The problem was solved using the csv module as proposed by Daenyth
Upvotes: 1
Reputation: 6012
It mainly depends on the Python version you are using but those 2 links shopuld help you out:
Upvotes: 0