Reputation: 369
I have data stored in a list of lists organized like so:
lst = [
['FHControl', G, A]
['MNHDosed', G, C]
]
For row in lst: row[0] there are a total of 12 categories (I've listed two in the sample code above). For row[1] and row[2] I am only concerned with 6 of the combinations of these letters. Therefore, I have 72 possible combinations of this data per row in lst and need to count the instances of each combination without having to write dozens of nested if loops.
I am attempting in creating two functions to parse through these lists and bin the incidences of these 72 combinations. How can I use two function like what I am beginning to write below to update these variables? Do I need to construct the dictionaries as class variables so that I can continue to update them as I iterate through both functions? Any guidance would be great!
Here is the code I have currently that initializes all 72 variables into 6 dictionaries (for the 6 combinations of letters in row[1] and row[2]):
def baseparser(lst):
TEMP = dict.fromkeys('FHDosed FHControl FNHDosed FNHControl '
'FTDosed FTControl MHDosed MHControl '
'MNHDosed MNHControl MTDosed MTControl'.split(), 0)
TRI_1, TRI_2, TRV_1, TRV_2, TRV_3, TRV_4 = ([dict(TEMP) for i in range(6)])
for row in lst:
if row[0] == 'FHDosed':
binner(row[0], row[1], row[2])
if row[0] == 'FHControl':
binner(row[0], row[1], row[2])
etc.
def binner(key, q, s):
if (q == 'G' and s == 'A') or (q =='C' and s =='T'):
TRI_1[key] += 1
elif (q == 'A' and s == 'G') or (q =='T' and s =='C'):
TRI_2[key] += 1
elif (q == 'G' and s == 'T') or (q =='C' and s =='A'):
TRV_1[key] += 1
elif (q == 'G' and s == 'C') or (q =='C' and s =='G'):
TRV_1[key] += 1
elif (q == 'A' and s == 'T') or (q =='T' and s =='A'):
TRV_1[key] += 1
elif (q == 'A' and s == 'C') or (q =='T' and s =='G'):
TRV_1[key] += 1
Upvotes: 2
Views: 268
Reputation: 880547
Your code could be simplified to:
TEMP = dict.fromkeys('''FHDosed FHControl FNHDosed FNHControl FTDosed FTControl MHDosed
MHControl MNHDosed MNHControl MTDosed MTControl'''.split(), 0)
TRI_1, TRI_2, TRV_1, TRV_2, TRV_3, TRV_4 = [TEMP.copy() for i in range(6)]
dmap = {
('G', 'A'): TRI_1,
('C', 'T'): TRI_1,
('A', 'G'): TRI_2,
('T', 'C'): TRI_2,
('G', 'C'): TRV_1,
('C', 'G'): TRV_1,
('A', 'T'): TRV_1,
('T', 'A'): TRV_1,
}
for row in lst:
key, q, s = row
dmap[q, s][key] += 1
Another possiblity is to use one dict of dicts instead of 6 dicts:
TEMP = dict.fromkeys('''FHDosed FHControl FNHDosed FNHControl FTDosed FTControl MHDosed
MHControl MNHDosed MNHControl MTDosed MTControl'''.split(), 0)
TR = {key:TEMP.copy() for key in ('TRI_1', 'TRI_2', 'TRV_1', 'TRV_2', 'TRV_3', 'TRV_4')}
dmap = {
('G', 'A'): 'TRI_1',
('C', 'T'): 'TRI_1',
('A', 'G'): 'TRI_2',
('T', 'C'): 'TRI_2',
('G', 'C'): 'TRV_1',
('C', 'G'): 'TRV_1',
('A', 'T'): 'TRV_1',
('T', 'A'): 'TRV_1',
}
lst = [
['FHControl', 'G', 'A'],
['MNHDosed', 'G', 'C']
]
for row in lst:
key, q, s = row
TR[dmap[q, s]][key] += 1
print(TR)
The advantage of doing it this way is that you have fewer dicts in your namespace, and it may be easier to refactor the code later using a dict of dicts instead of hard-coding 6 dicts.
Following up on Midnighter's suggestion, if you have pandas, you could replace the dict of dicts with a DataFrame. Then the frequency of pairs could be computed using pd.crosstabs like this:
import pandas as pd
dmap = {
'GA': 'TRI_1',
'CT': 'TRI_1',
'AG': 'TRI_2',
'TC': 'TRI_2',
'GC': 'TRV_1',
'CG': 'TRV_1',
'AT': 'TRV_1',
'TA': 'TRV_1',
}
lst = [
['FHControl', 'G', 'A'],
['MNHDosed', 'G', 'C']
]
df = pd.DataFrame(lst, columns=['key', 'q', 's'])
df['tr'] = (df['q']+df['s']).map(dmap)
print(df)
# key q s tr
# 0 FHControl G A TRI_1
# 1 MNHDosed G C TRV_1
print(pd.crosstab(rows=[df['key']], cols=[df['tr']]))
yields
tr TRI_1 TRV_1
key
FHControl 1 0
MNHDosed 0 1
Upvotes: 4