Reputation: 59
To find a mutation like for S104R(from 2288681 to 2289241 for pyrazinamide), we have to first remove '-'(for stripping insertion/deletions if/any present in fasta file), then take reverse complement of it and then look for the particular mutation assigned with the codon number(here is 104). And I have found the answer using basic string functions but wanted more clean and simple if it is possible with biopython library.
Upvotes: -1
Views: 141
Reputation: 59
So the following code works fine for me:
from Bio import SeqIO
sample_file=SeqIO.parse('fasta_file_location', 'fasta') // there are two items in sample_file(reference and patient sequence)
ref=str(sample_file[0].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]
pat=str(sample_file[1].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]
print("ref: ",ref, "pat: ", pat) // output-> ref: AGC, pat: CGG
but the below code is not working for me:
ref=sample_file[0].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]
pat=sample_file[1].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]
Its good to have more convenient approach as the latter one uses biopython functions, so please help if you know how to make it better.
Upvotes: 0