Drawing multiple sequences from 1 file, based on shared fields in another file

Question

I'm trying to run a python script to draw sequences from a separate file (merged.fas), in respect to a list (gene_fams_eggnog.txt) I have as output from another program.

The code is as follows:

from Bio import SeqIO
import os, sys, re
from collections import defaultdict
sequences = "merged.fas"
all_seqs = SeqIO.index(sequences, "fasta")
gene_fams = defaultdict(list)

gene_fams_file = open("gene_fams_eggnog.txt")
for line in gene_fams_file:
     fields = re.split("	", line.rstrip())
     gene_fams[fields[0]].append[fields[1]]

for fam in gene_fams.keys():
    output_filename = str(fam) + ".fasta"
    outh = open(output_filename, "w")
    for id in gene_fams[fam]:
        if id in all_seqs:
            outh.write(">" + all_seqs[id].description + "
" + str(all_seqs[id].seq) + "
")
        else:
            print "Uh oh! Sequence with ID " + str(id) + " is not in the all_seqs file!"
            quit()
    outh.close()

The list looks like this:

1 Saccharomycescerevisiae_DAA09367.1
1 bieneu_EED42827.1
1 Asp_XP_749186.1
1 Mag_XP_003717339.1
2 Mag_XP_003716586.1
2 Mag_XP_003709453.1
3 Asp_XP_749329.1

The field 0 denotes a grouping based by a similarity between the sequences. The script was meant to take all the sequences from merged.fas that correspond to the code in the field 1 and write them into a file base on field 0.

So in the case of the portion of the list I have shown, all the sequences that have a 1 in field 0 (Saccharomycescerevisiae_DAA09367.1, bieneu_EED42827.1, Asp_XP_749186.1, Mag_XP_003717339.1) would have been written into a file called 1.fasta. This should continue from 2.fasta-however many groups there are.

So this has worked, however it doesn't include all the sequences that are in the group, it'll only include the last one to be listed as a part of that group. Using my example above, I'd only have a file (1.fasta) with one sequence (Mag_XP_003717339.1), instead of all four.

Any and all help is appreciated, Thanks, JT

Drawing multiple sequences from 1 file, based on shared fields in another file

Answers (1)

Related Questions