Reputation: 3432
I have an alignment FASTA file:
>SEQ1
AGGGGGCGGATAGGCAG-C--AGGCGCGGAGAGCGGCAGCG---GG---GGACG-AC-
AGGAACACCTATACTTcCTTCTAC-CAGACGAAC-------CGAGACa
>SEQ2
TGGGGGCGGATGGGCAG----AGGCGCGGAGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTAHACCCcCTTCTAC-CAGACGAAC-------CGACACa
>SEQ3
TGGGGGCGGATGGGCAG-T--AGGCGCGGATAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTATACCCcCTTCTAC-CAGACGAAC-------CGACACa
>SEQ4
TGGGGGCGGATGGGCAG-A--AGGCGCGGTGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTACACCCcCTTCTAC-CAGACGAAC-------CGACACa
>SEQ5
TGGGGGCGGATGGGCAG-G--AGGCGCGGAGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTATACCCcCTTCTAC-CAGACGAAC-------CGACACa
and I would like to open this file, and create all the possible pair between SEQn
(without the same sequence in a pair)
for instance if there are 5 SEQ, then there will be n(n-1)/2
5(5-1)/2 = 10 groups
so
SEQ1 vs SEQ2
SEQ1 vs SEQ3
SEQ1 vs SEQ4
SEQ1 vs SEQ5
SEQ2 vs SEQ3
SEQ2 vs SEQ4
SEQ2 vs SEQ5
SEQ3 vs SEQ4
SEQ3 vs SEQ5
SEQ4 vs SEQ5
and I want to create 10 new FASTA alignment file:
SEQ1 _vs_SEQ2.fa
>SEQ1
AGGGGGCGGATAGGCAG-C--AGGCGCGGAGAGCGGCAGCG---GG---GGACG-AC-
AGGAACACCTATACTTcCTTCTAC-CAGACGAAC-------CGAGACa
>SEQ2
TGGGGGCGGATGGGCAG----AGGCGCGGAGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTAHACCCcCTTCTAC-CAGACGAAC-------CGACACa
.
.
.
SEQ4 _vs_SEQ5.fa
>SEQ4
TGGGGGCGGATGGGCAG-A--AGGCGCGGTGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTACACCCcCTTCTAC-CAGACGAAC-------CGACACa
>SEQ5
TGGGGGCGGATGGGCAG-G--AGGCGCGGAGAGCGGCAGCG-A-GG---GGACG-AC-
AGGAACATCTATACCCcCTTCTAC-CAGACGAAC-------CGACACa
Does someone have a fast idea (I have more then 122 taxa within each group which mean that I need to create 7381 files and 200 times)?
SO far I used:
records=SeqIO.to_dict(SeqIO.parse("mysequence.fasta", "fasta"))
list_taxa=list(records)
#Create all possible pairs
List_pairs=[(list_taxa[i],list_taxa[j]) for i in range(len(list_taxa)) for j in range(i+1, len(list_taxa))]
count=0
SEQ_NAME="Name_seq"
for i in List_pairs:
new_pair_name=i[0]+"_"+i[1]+"_"+SEQ_NAME
output_file=open("/My_dir_where_to_create_the_files/"+new_pair_name+".fa","w")
for a in i:
print(">",records[a].id,sep="",file=output_file)
print(records[a].seq,file=output_file)
print(int(count),"/",len(list(records)))
count+=1
but it is not so efficient for the part where I create the pairs alignment files...
Upvotes: 0
Views: 154
Reputation: 88
Assuming all the files are in the same directory, this should be sufficient to have a list of all of the combos possible.
import os
ListofFiles = []
for file in os.path(WhateverYourDirectoryContainingTheFilesIs):
if file.startswith('SEQ'):
ListofFiles.append(file)
length = len(ListofFiles)
FileCombos = []
for i in range(length):
for j in range(i+1, length):
FileCombos.append(ListofFiles[i] + " vs " + ListofFiles[j])
From here, it shouldn't be too difficult to create your files as you need them, in your preferred method. I may come back to this tomorrow for a more pythonic solution, if no one else cooks one up before then.
Upvotes: 1