Reputation: 29
I have a DNA text file and I need to specifically use lists and loops to count the occurrences of dinucleotide pairs (ex: AA, AC, AT, AG, CA, CC... etc) then use lists and loops again to print the counts to a new text file as a table with two columns separated by a tab: the dinucleotide sequence and the count. I know how to do this the long way (store each pair in variables then count occurrences using count, then open text file and print each individual counts to text file) but I am just now starting to learn about lists and loops and confused on how I would do it that way.
ex: this is how I do it:
dna1.txt is my (random) example of a dna sequence text file on my computer:
random sequence (i.e. dna1.txt):
agggaatcgctggtgaagaggttgtgacctcttataaccccattgttaatgaggtccacg ctaagtaatgagtggctggtataggtgacgtctagaagtcatttctgtacagttactgcc gtggatatatccattaggacgacactggggtgctcccacgcaccacgtgtacaggacgac tgcgatgatatagaaggtgagcttaaaacgttctacaaccccaatgaatcatagccgggt agattgccaggcgtgtggtaacgggtacgtggcggatctcgtccagtatgccgcagtcac acccgaatctttcgtcgactacggagcgactcgtatcgagacgggcttgaattgactcct catggattaggctgaggtcaaccttcgcatggagcctgggcatttaaaggtcgactgtcg
dna_txt = open("dna1.txt")
dna_txtcontents = dna_txt.read()
aa_count = dna_txtcontents.count("aa")
print str(aa_count)
then continue for each pair then store each individual count in a new text file but how do I make it easier for myself by using lists and loops to both count occurrences of each pair then store counts in a new text file? Oh and also making sure that the program would work whether the sequence is uppercase or lowercase?
Thank you!!
Upvotes: 1
Views: 651
Reputation: 2374
You can use itertools.product
to create all dinucleotide pairs. To make it case-insensitive, convert everything to lowercase (or uppercase).
import itertools
with open("dna1.txt") as dna_txt:
dna_txtcontents = dna_txt.read().upper()
nt_pair_counts = {}
for nt_pair in itertools.product('ACTG', repeat=2):
nt_pair = "".join(nt_pair)
nt_pair_counts[nt_pair] = dna_txtcontents.count(nt_pair)
with open("out.txt", "wt", encoding="utf-8") as fd:
for nt, count in nt_pair_counts.items():
print(nt, count, sep="\t", file=fd)
Upvotes: 1