Reputation: 43
I have a list of sequence starting coordinates and I wanted to retrieve those sequences from the genome fasta file which coordinates are present in the list. I tried using grep and in R but didn't get desired output
list of coordinates
10001276
10001433
10002237
10002342
10002617
10002736
10003584
10003832
10005377
1000567
which option would be efficient?
Upvotes: 0
Views: 868
Reputation: 410
I suggest you try BioPython:
from Bio import SeqIO
record = SeqIO.read("NC_006581.gbk", "genbank")
print("\nPosition 10001276: ", record.seq[10001276,10001276+1])
Upvotes: 0