Reputation: 125
I am working on some FASTA-like sequences (not FASTA, but something I have defined that's similar for some culled PDB from the PISCES server).
I have a question. I have a small no of sequences called nCatSeq
, for which there are MULTIPLE nBasinSeq
. I go through a large PDB file and I want to extract for each nCatSeq
the corresponding nBasinSeq
without redundancies in a dictionary. The code snippet that does this is given below.
nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
potBasin[nCatSeq]=nBasinSeq
else:
if nBasinSeq not in potBasin[nCatSeq]:
potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq
else:
pass
I get the following as the answer for one nCatSeq,
'4241': ((('VUVV', 'DDRV'), 'DDVG'), 'VUVV')
what I want however is :
'4241': ('VUVV', 'DDRV', 'DDVG', 'VUVV')
I don't want all the extra brackets due to the following command
potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq
(see above code snippet)
Is there a way to do this ?
Upvotes: 7
Views: 274
Reputation: 113915
Your question boils down to flattening a nested list and eliminating redundant entries:
def flatten(nested, answer=None):
if answer is None:
answer = []
if nested == []:
return answer
else:
n = nested[0]
if is instance(n, tuple):
return flatten(nested[1:], nested(n[0], answer))
else:
return flatten(nested[1:], answer+n[0])
So, with your nested dictionary:
for k in nested_dict:
nested_dict[k] = tuple(flatten(nested_dict[k]))
if you want to eliminate duplicate entries:
for k in nested_dict:
nested_dict[k] = tuple(set(flatten(nested_dict[k])))
Hope this helps
Upvotes: 0
Reputation: 375415
You can add them as tuples:
if nCatSeq not in potBasin:
potBasin[nCatSeq] = (nBasinSeq,)
else:
if nBasinSeq not in potBasin[nCatSeq]:
potBasin[nCatSeq] = potBasin[nCatSeq] + (nBasinSeq,)
That way, rather than:
(('VUVV', 'DDRV'), 'DDVG')
# you will get
('VUVV', 'DDRV', 'DDVG') # == ('VUVV', 'DDRV')+ ('DDVG',)
Upvotes: 1
Reputation: 16327
The problem is putting a comma to "append" an element just creates a new tuple every time. To solve this you use lists and append
:
nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
potBasin[nCatSeq]=[nBasinSeq]
elif nBasinSeq not in potBasin[nCatSeq]:
potBasin[nCatSeq].append(nBasinSeq)
Even better would be to instead of making potBasin a normal dictionary, replace it with a defaultdict
. The code can then be simplified to:
# init stuff
from collections import defaultdict
potBasin = defaultdict(list)
# inside loop
nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
potBasin[nCatSeq].append(nBasinSeq)
Upvotes: 5