Reputation: 301
Using Python 3 I am trying to get a dictionary of names and counts of the occurrence of certain strings in one long string.
I am sitting here pulling my hair out as this should not be complicated but I have read a lot of answers to this already and I still am not getting it. I'm 5 hours in and definitely not seeing the wood for the trees now.
Hopefully, someone can show me where I am going wrong.
The string is called seq
.
seq = 'AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG'
I have a CSV of words I am looking for and that is in a list called nu
nu = ['AGATC', 'AATG', 'TATC']
The code should use each of the words in nu and get a count of the number of occurrences in seq
.
Here is my loop
for i in nu:
searchstr = {}
# Line returns a dict of the last value added
searchstr = dict(key = (i), count = (seq.count(i)))
print(searchstr)
print(searchstr.keys())
print(searchstr.values())
and the output so I know I'm matching the count correctly with the keys:
{'key': 'AGATC', 'count': 4}
{'key': 'AATG', 'count': 1}
{'key': 'TATC', 'count': 5}
dict_keys(['key', 'count'])
dict_values(['TATC', 5])
I just can't for the life of me get the three dicts into one. I am just left with a dict of ['TATC', 5] as it has overwritten the previous in the list.
I'm still new to this but trying to learn along the way.
Upvotes: 0
Views: 203
Reputation: 7490
You declare each at iteration of the loop; that's why you can always see just the last inserted key.
I don't know if it would be an appreciated suggestion, but instead of defining key
and count
as... key and value I would just use the searched DNA sequence as a key. Something like that:
searchstr = dict()
for i in nu:
searchstr[i] = seq.count(i)
print(searchstr.keys())
print(searchstr.values())
print(searchstr)
print(searchstr['AATG']) #reading a specific result
Output:
dict_keys(['AGATC', 'AATG', 'TATC'])
dict_values ([4, 1, 5])
{'AGATC': 4, 'AATG': 1, 'TATC': 5}
1
As you can see, the dictionary just needs to be declared outside the loop, and in the loop you'll add an element for every searched string.
Please note how it will be easier accessing the specific sequence count.
Upvotes: 0
Reputation: 1306
searchstr = {}
for i in nu:
# Line returns a dict of the last value added
# Earlier the dictionary declaration was here which was overriding the previous value
searchstr = dict(key = (i), count = (seq.count(i)))
print(searchstr)
print(searchstr.keys())
print(searchstr.values())
Move the dictionary declaration outside
Upvotes: 0
Reputation: 311
I think this is what you want:
searchstr = {}
for i in nu:
# Line returns a dict of the last value added
searchstr[i] = seq.count(i)
print(searchstr)
Upvotes: 1
Reputation: 13079
All you need is to assign elements to the dictionary, not create a new dictionary each time:
searchstr = {}
for i in nu:
searchstr[i] = seq.count(i)
print(searchstr)
Upvotes: 1