user2013985
user2013985

Reputation: 21

Loading 15GB file in Python

I have a 15GB text file containing 25000 lines. I am creating a multi level dictionary in Python of the form : dict1 = {'':int}, dict2 = {'':dict1}.

I have to use this entire dict2 multiple times (about 1000...in a for loop) in my program. Can anyone please tell a good way to do that.

The same type of information is stored in the file (count of different RGB values of 25000 images. 1 image per line) eg : 1 line of the file would be like : image1 : 255,255,255-70 ; 234,221,231-40 ; 112,13,19-28 ; image2 : 5,25,25-30 ; 34,15,61-20 ; 102,103,109-228 ; and so on.

Upvotes: 2

Views: 560

Answers (2)

Matt Alcock
Matt Alcock

Reputation: 12881

The best way to do this is to use chunking.

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data

f = open('really_big_file.dat')
for piece in read_in_chunks(f):
    process_data(piece)

As a note as you start to process large files moving to a map-reduce idiom may help as you'll be able to work on separate chunked files independently without pulling the complete data set into memory.

Upvotes: 2

giodamelio
giodamelio

Reputation: 5605

In python, if you use a file object as an iterator, you can read a file line by line without opening the whole thing in memory.

for line in open("huge_file.txt"):
    do_something_with(line)

Upvotes: 1

Related Questions