Reputation: 2389
consider the tab-separated file foo.txt
:
chrY 1208806 1208908 + .
chrY 1212556 1212620 + .
chrY 1465479 1466558 + .
The goal is to manipulate foo.txt
to obtain result.txt
as such:
chrY:1208806-1208908
chrY:1212556-1212620
chrY:1465479-1466558
This code works:
with open(filename,'r') as f:
for line in f:
l = line.split()[0:3]
result = f'{l[0]}:{l[1]}-{l[2]}'
print(result)
But what if foo.txt
would be a giant file that cannot be fit into memory, saving every line in a list l
wouldn't be feasible. How can I write the previous mentioned code into a generator/iter
?
Thanks.
Upvotes: 1
Views: 86
Reputation: 404
I've needed to do this in the past, to process files about 50GB+ in size. What you need to do is just write out each line as you process it.
with open('foo.txt','r') as src, open('result.txt','w') as tgt:
for line in src:
l = line.split()[0:3]
result = f'{l[0]}:{l[1]}-{l[2]}\n'
tgt.write(result)
(note the inclusion of the newline character \n
in result
)
Processing large files takes a while this way, but there's barely any increase in RAM usage.
I just tested your example copied many times over, and it worked fine.
Upvotes: 1