JoeF
JoeF

Reputation: 853

Count values with items in a dictionary with sublists

I'n being warned that this question has been frequently downvoted, but I haven't seen a solution for my particular problem.

I have a dictionary that looks like this:

d = {'a': [['I', 'said', 'that'], ['said', 'I']], 
    'b':[['she', 'is'], ['he', 'was']]}

I would like for the output to be a dictionary with the original keys and then a dictionary containing a value that indicates the count for each of the words (e.g., {'a':{'I':2, 'said':2, 'that':1} and so on with b.

If the values were in a list instead of a sublist, I could get what I wanted just by using Counter:

d2 = {'a': ['I','said','that', 'I'],'b': ['she','was','here']}
from collections import Counter
counts = {k: Counter(v) for k, v in d2.items()}

However, I'm getting TypeError: unhashable type: 'list' because the lists containing the values I want to count are sublists and the list that contains them isn't hashable.

I also know that if I just had sublists, I could get what I want with something like:

lst = [['I', 'said', 'that'], ['said', 'I']]
Counter(word for sublist in lst for word in sublist)

But I just can't figure out how to combine these ideas to solve my problem (and I guess it lies in combining these two).

I did try this

for key, values in d.items():
    flat_list = [item for sublist in values for item in sublist]
    new_dict = {key: flat_list}
    counts = {k: Counter(v) for k, v in new_dict.items()}

But that only gives me the counts for the second list (because the flat_list itself only returns the value for the second key.

Upvotes: 0

Views: 137

Answers (3)

Sahil Agarwal
Sahil Agarwal

Reputation: 595

Use both itertools and collections modules for this. Flatten the nested lists with itertools.chain and count with collections.Counter

import itertools, collections
d = {
         'a': [['I', 'said', 'that'], ['said', 'I']],
         'b':[['she', 'is'], ['he', 'was']]
    }
out_dict = {}
for d_key, data in d.items():    
    counter = collections.Counter(itertools.chain(*data))
    out_dict[d_key] = counter
print out_dict

Output:

{'a': Counter({'I': 2, 'said': 2, 'that': 1}),
 'b': Counter({'she': 1, 'is': 1, 'he': 1, 'was': 1})}

Upvotes: 0

ludaavics
ludaavics

Reputation: 678

You can merge your sublists to get your d2: d2 = {k: reduce(list.__add__, d[k], []) for k in d}.

In python3, you will need to from functools import reduce

Upvotes: 0

Alex Hall
Alex Hall

Reputation: 36043

To combine the two solutions, just replace Counter(v) from your first solution with the second solution.

from collections import Counter

d = {'a': [['I', 'said', 'that'], ['said', 'I']],
     'b': [['she', 'is'], ['he', 'was']]}


counts = {k: Counter(word
                     for sublist in lst
                     for word in sublist)
          for k, lst in d.items()}

print(counts)

Output:

{'a': Counter({'I': 2, 'said': 2, 'that': 1}),
 'b': Counter({'she': 1, 'is': 1, 'he': 1, 'was': 1})}

Upvotes: 2

Related Questions