BountifulDawn
BountifulDawn

Reputation: 25

Nested Loops in Python Storing Results in a singular Dictionary

Good evening SO,

I am currently working on a program to learn more about Python as I continue my Undergraduate degree. I am attempting to creating a Bioinformatic program that takes advantage of Markov models to provide and predict certain P(x) statements throughout. I am working on cleaning up my code as I have found a TON of repeats. I am NOT asking for an answer - Moreso advice or perhaps a nudge in a direction to keep me going forward in a positive and Python centric mindset.

Is there any way in Python that I can turn

aa_count = markov_data_set.count('AA')
at_count = markov_data_set.count('AT')
ag_count = markov_data_set.count('AG')
ac_count = markov_data_set.count('AC')
tt_count = markov_data_set.count('TT')
ta_count = markov_data_set.count('TA')
tg_count = markov_data_set.count('TG')
tc_count = markov_data_set.count('TC')
cc_count = markov_data_set.count('CC')
ca_count = markov_data_set.count('CA')
cg_count = markov_data_set.count('CG')
ct_count = markov_data_set.count('CT')
gg_count = markov_data_set.count('GG')
ga_count = markov_data_set.count('GA')
gt_count = markov_data_set.count('GT')
gc_count = markov_data_set.count('GC')

Into something more Simple? I've been reading several books on Python (Crashcourse on Python and Primers to Scientific Coding with Python) and I believe I can use loops or nested loops to make something shorter and more organized. Examples I have tried are as follows:

di_nucleotide = ('AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC')
nucleotide_count = ()
nucleotide_frequency = []

for binomials in di_nucleotide:
     di_nucleotide.count()


The problem is, sadly... I get stuck from there, which is a bit discouraging. What I would want the end product to be would be is something that stores Var1 and Var2 into a singular dictionary file I can store or call later while also having those two variables separate as needed.

di_nucleotide = ('AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC')
nucleotide_count = (int1, int2, int3, int4, ...)
nucleotide_frequency = ['AA':Count, 'AT'Count, 'AG'Count, ...]

This will be my first post on SO. I recognize this may not be the best avenue to ask for advice, but if there is anything I can do to make my posts better in the future, please let me know so I may improve.

As always, thank you, everyone, and have an amazing day! I look forward to continuing my journey on coding.

Upvotes: 1

Views: 129

Answers (2)

Jan Christoph Terasa
Jan Christoph Terasa

Reputation: 5935

Use itertools.product to generate the pairs:

import itertools

bases = 'ACGT'
nucs = [''.join(pair) for pair in itertools.product(bases, repeat=2)]
# ['AA', 'AC', 'AG' ....

You can then run the function in a loop inside a dictionary comprehension, replacing your individual calls:

counts = {nuc: markov_data_set.count(nuc) for nuc in nucs}

counts is a dictionary of your results. The keys are 'AA', 'AC' and so on.

Upvotes: 2

warped
warped

Reputation: 9481

you can store everything in a dictionary, which you generate on the fly:

# initialise dictionary and total counts
nucleotide_counts = {}
total_counts = 0

# loop through dinucleotide counts
for dn in ['AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC']:
    # store in dictionary
    counts = markov_data_set.count(dn)
    nucleotide_counts[dn] = counts
    total_counts += counts

from there, you can generate the frequencies:

frequencies = {}
for dn, counts in nucleotide_counts.items():
    frequencies[dn] = counts / total_counts   

Upvotes: 0

Related Questions