user653861
user653861

Reputation: 43

read a very very big file with python

What is the best solution to process each line of a text file whose size is about 500 MB?

The proposal to which I had thought :

def files(mon_fichier):
    while True:
        data = mon_fichier.read(1024)
        if not data:
            break
        yield data

fichier = open('tonfichier.txt', 'r')
for bloc in files(fichier):
    print bloc

Thank you in advance

Upvotes: 2

Views: 3844

Answers (4)

eyquem
eyquem

Reputation: 27585

As far as I understand the processes, the reading of a file goes through a buffer.

In this condition, mon_fichier.read(1024) don't fetch 1024 bytes directly from the file but from the buffer until this one will be exhausted, and then the buffer will be filled again by a new real reading of, say, 4096 or 8192 or 16384 or... bytes, I don't know precisely (think it's a power of 2, but even not sure)

Then, if you really want to treat blocks of bytes , I think that philnext's code is preferable. But readline(1000) must be replaced with read(1000) if you want to fetch exactly 1000 bytes; readline(1000) returns a line, and no more, even if the line is 4 characters long.

Treating a file by blocks may be what you really want to do , but it seems uncommon to me. It is more frequent to treat a file by lines, and in this case it's the Hugh Bothwell's code that is the right manner.

Upvotes: 0

philnext
philnext

Reputation: 3402

The answer is depending what you want to do with the datas... I recommend to read by block and treat each block just after reading like :

fs = open(source, 'r')
while 1:
    txt = fs.readline(1000)
    < your treatement>
    if txt =="":
    break
fs.close()

Upvotes: 1

Hugh Bothwell
Hugh Bothwell

Reputation: 56714

with open('myfile.txt') as inf:
    for line in inf:
        # do something
        pass

Upvotes: 11

filmor
filmor

Reputation: 32298

Just using the standard file operations should work as long as you keep away from readlines and instead just use readline.

Upvotes: 7

Related Questions