johnchase
johnchase

Reputation: 13705

Write multiple fasta entries using scikit-bio write

I am trying to read FASTA file entries using scikit-bio and then write certain entries back out to another file if it meets some requirement. The issue I am running into is that the .write methods seems to open and close a file so each entry overwrites the previous.

In [39]: f = 'seqs.fna'
         seqs = skbio.io.read(f, format='fasta')
         for seq in seqs:
             if seq.metadata['id'] in ['47P50SDHBQ1PA_0', '4OZ9UI889OL5V_1', '2EC8VWHQD1LW5_2']:
                 print('True')
                 seq.write('foo.txt')

True
True

I would hope that in this case two entries would be written to foo.txt however only the last entry is present. How can I write all of the sequences meeting my criteria to file?

Upvotes: 1

Views: 340

Answers (1)

jairideout
jairideout

Reputation: 680

Write to the same open file instead of specifying a file path:

with open('output.fna', 'w') as output_fh:
    for seq in skbio.io.read('seqs.fna', format='fasta'):
        if seq.metadata['id'] in ['47P50SDHBQ1PA_0', '4OZ9UI889OL5V_1', '2EC8VWHQD1LW5_2']:
            seq.write(output_fh)

Alternatively you can use skbio.io.write to write a generator of sequences:

def filtered_seqs():
    for seq in skbio.io.read('seqs.fna', format='fasta'):
        if seq.metadata['id'] in ['47P50SDHBQ1PA_0', '4OZ9UI889OL5V_1', '2EC8VWHQD1LW5_2']:
            yield seq

skbio.io.write(filtered_seqs(), format='fasta', into='output.fna')

Upvotes: 3

Related Questions