David
David

Reputation: 5016

Efficiently reading a csv file with windows newline on linux in Python

The following is working under windows for reading csv files line by line.

f = open(filename, 'r')

for line in f:

Though when copying the csv file to a linux server, it fails.

It should be mentioned that performance is an issue as the csv files are huge. I am therefore concerned about the string copying when using things like strip.

Upvotes: 2

Views: 2751

Answers (5)

T.E.D.
T.E.D.

Reputation: 44804

Actually, the most efficient way to read any file is in one big I/O. There isn't always enough RAM to do that, but the less I/Os the better.

Upvotes: 0

John Machin
John Machin

Reputation: 82934

Ummm .... You have csv files, you are using Python, why not read the files using the Python csv module?

Upvotes: 4

John La Rooy
John La Rooy

Reputation: 304157

If performance is important, why are you not using csv.reader?

Upvotes: 6

Sean Cavanagh
Sean Cavanagh

Reputation: 4917

The dos2unix utility will do this very efficiently. If the files are that large I would run that command as part of the copy.

Upvotes: 1

AndiDog
AndiDog

Reputation: 70158

Python has builtin support for Windows, Linux and Mac line endings:

f = open(filename, 'rtU')

for line in f:
    ...

If you really want don't want slow string operations, you should strip the files before processing them. You can either use dos2unix (can be found in the Debian package "tofrodos") or (easier) use FTP text mode which should do that automatically.

Upvotes: 7

Related Questions