Reputation: 642
I am trying to convert a large text file (size of more than 5 GB), but I got a from this post, I managed to convert encoding format of a text file into a format that is readable with this:
path = 'path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size (more than 5 GB). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f
and add it to t
until end of the line, right?
Upvotes: 0
Views: 120
Reputation: 25023
You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.
Upvotes: 1