moth
moth

Reputation: 2389

Implementing a generator/iter in python when reading a file

consider the tab-separated file foo.txt:

chrY    1208806 1208908 +   .
chrY    1212556 1212620 +   .
chrY    1465479 1466558 +   .

The goal is to manipulate foo.txt to obtain result.txt as such:

chrY:1208806-1208908
chrY:1212556-1212620
chrY:1465479-1466558

This code works:

with open(filename,'r') as f:
    for line in f:
        l = line.split()[0:3]
        result = f'{l[0]}:{l[1]}-{l[2]}'
        print(result)

But what if foo.txt would be a giant file that cannot be fit into memory, saving every line in a list l wouldn't be feasible. How can I write the previous mentioned code into a generator/iter ?

Thanks.

Upvotes: 1

Views: 86

Answers (1)

Cerberton
Cerberton

Reputation: 404

I've needed to do this in the past, to process files about 50GB+ in size. What you need to do is just write out each line as you process it.

with open('foo.txt','r') as src, open('result.txt','w') as tgt:
    for line in src:
         l = line.split()[0:3]
         result = f'{l[0]}:{l[1]}-{l[2]}\n'
         tgt.write(result)

(note the inclusion of the newline character \n in result)

Processing large files takes a while this way, but there's barely any increase in RAM usage.

I just tested your example copied many times over, and it worked fine.

Upvotes: 1

Related Questions