ScuffedCoder
ScuffedCoder

Reputation: 408

Python Iterating through two lists only iterates through last element

I am trying to iterate through a double list but am getting the incorrect results. I am trying to get the count of each element in the list.

l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]

dict = {}

for words in l:
    for letters in words:
        dict[letters] = words.count(letters)


for x in countVocabDict:
        print(x + ":" + str(countVocabDict[x]))

at the moment, I am getting:

<s>:1
a:1
b:2
c:2
</s>:1

It seems as if it is only iterating through the last list in 'l' : ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']

but I am trying to get:

<s>: 3
a: 4
b: 5
c: 6
</s>:3

Upvotes: 2

Views: 86

Answers (4)

Somya Avasthi
Somya Avasthi

Reputation: 121

As per your question, you seem to know that it only takes on the result of the last sublist. This happens because after every iteration your previous dictionary values are replaced and overwritten by the next iteration values. So, you need to maintain the previous states values and add it to the newly calculated values.

You can try this-

l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
d={}
for lis in l:
    for x in lis:
        if x in d:
            d[x]+=1
        else:
            d[x]=1

So the resulting dictionary d will be as-

{'<s>': 3, 'a': 4, 'c': 6, 'b': 5, '</s>': 3}

I hope this helps!

Upvotes: 0

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

The dictionary is being overwritten in every iteration, rather it should update

count_dict[letters] += words.count(letters)

Initialize the dictionary with defaultdict

from collections import defaultdict
count_dict = defaultdict(int)

Upvotes: 1

DylannCordel
DylannCordel

Reputation: 595

As @Vishnudev said, you must add current counter. But dict[letters] must exists (else you'll get a KeyError Exception). You can use the get method of dict with a default value to avoir this:

l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], 
     ['<s>', 'a', 'c', 'b', 'c', '</s>'], 
     ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]

dict = {}
for words in l:
    for letters in words:
        dict[letters] = dict.get(letters, 0) + 1

Upvotes: 0

timgeb
timgeb

Reputation: 78690

In each inner for loop, you are not adding to the current value of dict[letters] but set it to whatever amount is counted for the current sublist (peculiarly) named word.

Fixing your code with a vanilla dict:

>>> l = [['<s>', 'a', 'a', 'b', 'b', 'c', 'c', '</s>'], ['<s>', 'a', 'c', 'b', 'c', '</s>'], ['<s>', 'b', 'c', 'c', 'a', 'b', '</s>']]
>>> d = {}                                                                  
>>>
>>> for sublist in l: 
...:     for x in sublist: 
...:         d[x] = d.get(x, 0) + 1 
>>> d                                                                       
{'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3}

Note that I am not calling list.count in each inner for loop. Calling count will iterate over the whole list again and again. It is far more efficient to just add 1 every time a value is seen, which can be done by looking at each element of the (sub)lists exactly once.

Using a Counter.

>>> from collections import Counter                                         
>>> Counter(x for sub in l for x in sub)                                    
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})

Using a Counter and not manually unnesting the nested list:

>>> from collections import Counter                                         
>>> from itertools import chain                                        
>>> Counter(chain.from_iterable(l))                                         
Counter({'<s>': 3, 'a': 4, 'b': 5, 'c': 6, '</s>': 3})

Upvotes: 2

Related Questions