Reputation: 21
I have a 15GB text file containing 25000 lines. I am creating a multi level dictionary in Python of the form : dict1 = {'':int}, dict2 = {'':dict1}.
I have to use this entire dict2 multiple times (about 1000...in a for loop) in my program. Can anyone please tell a good way to do that.
The same type of information is stored in the file (count of different RGB values of 25000 images. 1 image per line) eg : 1 line of the file would be like : image1 : 255,255,255-70 ; 234,221,231-40 ; 112,13,19-28 ; image2 : 5,25,25-30 ; 34,15,61-20 ; 102,103,109-228 ; and so on.
Upvotes: 2
Views: 560
Reputation: 12881
The best way to do this is to use chunking.
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
f = open('really_big_file.dat')
for piece in read_in_chunks(f):
process_data(piece)
As a note as you start to process large files moving to a map-reduce idiom may help as you'll be able to work on separate chunked files independently without pulling the complete data set into memory.
Upvotes: 2
Reputation: 5605
In python, if you use a file object as an iterator, you can read a file line by line without opening the whole thing in memory.
for line in open("huge_file.txt"):
do_something_with(line)
Upvotes: 1