optimusPrime
optimusPrime

Reputation: 59

How to find Mutations for a reverse oriented gene(like pncA) from TB sequencing fasta file using biopython library in Python3?

To find a mutation like for S104R(from 2288681 to 2289241 for pyrazinamide), we have to first remove '-'(for stripping insertion/deletions if/any present in fasta file), then take reverse complement of it and then look for the particular mutation assigned with the codon number(here is 104). And I have found the answer using basic string functions but wanted more clean and simple if it is possible with biopython library.

Upvotes: -1

Views: 141

Answers (1)

optimusPrime
optimusPrime

Reputation: 59

So the following code works fine for me:

from Bio import SeqIO
sample_file=SeqIO.parse('fasta_file_location', 'fasta') // there are two items in sample_file(reference and patient sequence)

ref=str(sample_file[0].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]
pat=str(sample_file[1].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]

print("ref: ",ref, "pat: ", pat)  // output-> ref: AGC, pat: CGG

but the below code is not working for me:

ref=sample_file[0].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]
pat=sample_file[1].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]

Its good to have more convenient approach as the latter one uses biopython functions, so please help if you know how to make it better.

Upvotes: 0

Related Questions