user3390906
user3390906

Reputation: 157

Python - count and group items in list stored in dictionary

I have seen examples on how to count items in dictionary or list. My dictionary stored multiple lists. Each list stores multiple items.

d = dict{}
d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

1. I want to count frequency of each alphabet, i.e. the results should be

A - 4  
B - 1  
C - 2  
D - 1  
E - 1  
F - 1

2. I want to have group by each alphabet, i.e. the results should be

A - text1, text2, text4, text5  
B - text4  
C - text1, text3  
D - text3  
E - text1  
F - text1  

How can I achieve both by using some Python existing libraries without using many for loops?

Upvotes: 1

Views: 5307

Answers (5)

Chris Larson
Chris Larson

Reputation: 1724

There are a few ways to accomplish this, but if you'd like to handle things without worrying about import ing additional modules or installing and importing external modules, this method will work cleanly 'out of the box.'

With d as your starting dictionary:

d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

create a new dict, called letters, for your results to live in, and populate it with your letters, taken from d.keys(), by creating the letter key if it isn't present, and creating a list with the count and the key from das it's value. If it's already there, increment the count, and append the current key from d to it's d key list in the value.

letters = {}
for item in d.keys():
    for letter in d[item]:
        if letter not in letters.keys():
            letters[letter] = [1,[item]]            
        else:
            letters[letter][0] += 1
            letters[letter][1] += [item]

This leaves you with a dict called letters containing values of the counts and the keys from d that contain the letter, like this:

{'E': [1, ['text1']], 'C': [2, ['text3', 'text1']], 'F': [1, ['text1']], 'A': [4, ['text2', 'text4', 'text1', 'text5']], 'B': [1, ['text4']], 'D': [1, ['text3']]}`

Now, to print your first list, do:

for letter in sorted(letters):
    print(letter, letters[letter][0])

printing each letter and the contents of the first, or 'count' index of the list as its value, and using the built-in sorted() function to put things in order.

To print the second, likewise sorted(), do the same, but with the second, or 'key', index of the list in its value, .joined using a , into a string:

for letter in sorted(letters):
    print(letter, ', '.join(letters[letter][1]))

To ease Copy/Paste, here's the code unbroken by my ramblings:

d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}

letters = {}
for item in d.keys():
    for letter in d[item]:
        if letter not in letters.keys():
            letters[letter] = [1,[item]]            
        else:
            letters[letter][0] += 1
            letters[letter][1] += [item]

print(letters)

for letter in letters:
    print(letter, letters[letter][0])
print()
for letter in letters:
    print(letter, ', '.join(letters[letter][1]))

Hope this helps!

Upvotes: 2

hiro protagonist
hiro protagonist

Reputation: 46921

from collections import defaultdict

alphabets = defaultdict(list)
his is a way to acheive this:

    for text, letters in d.items():
        for letter in letters:
            alphabets[letter].append(text)

    for letter, texts in sorted(alphabets.items()):
        print(letter, texts)

    for letter, texts in sorted(alphabets.items()):
        print(letter, len(texts))

note that if you have A - text1, text2, text4, text5 to get to A - 4 is just a matter of counting the texts.

Upvotes: 0

wencakisa
wencakisa

Reputation: 5968

For your first task:

from collections import Counter


d = {
  'text1': ['A', 'C', 'E', 'F'],
  'text2': ['A'],
  'text3': ['C', 'D'],
  'text4': ['A', 'B'],
  'text5': ['A']
}

occurrences = Counter(''.join(''.join(values) for values in d.values()))
print(sorted(occurrences.items(), key=lambda l: l[0]))

Now let me explain it:

  • ''.join(values) turns the list (e.g. ['A', 'B', 'C', 'D'] into 'ABCD')
  • Then you join each list from the dictionary into one string (the outer ''.join())
  • Counter is a class from the builtin package collections, which simply counts the elements in the iterable (string in this case) and reproduces them as tuples of (key, value) pairs (e.g. ('A', 4))
  • Finally, I sort the Counter items (it's just like a dictionary) alphabetically (key=lambda l: l[0] where l[0] is the letter from the (key, value) pair.

As I saw, you already have the solution for your second problem.

Upvotes: 0

宏杰李
宏杰李

Reputation: 12178

from collections import Counter, defaultdict
from itertools import chain
d  = {'text1': ['A', 'C', 'E', 'F'], 
      'text2': ['A'], 
      'text3': ['C', 'D'], 
      'text4': ['A', 'B'], 
      'text5': ['A']}
counter = Counter(chain.from_iterable(d.values()))
group = defaultdict(list)
for k, v in d.items():
    for i in v:
        group[i].append(k)

out:

Counter({'A': 4, 'B': 1, 'C': 2, 'D': 1, 'E': 1, 'F': 1})
defaultdict(list,
            {'A': ['text2', 'text4', 'text1', 'text5'],
             'B': ['text4'],
             'C': ['text1', 'text3'],
             'D': ['text3'],
             'E': ['text1'],
             'F': ['text1']})

Upvotes: 0

UltraInstinct
UltraInstinct

Reputation: 44484

To get to (2), you would have to first invert the keys and values of a dictionary, and store them in a list. Once you are there, use groupby with a key to get to the structure of (2).

from itertools import groupby

arr = [(x,t) for t, a in d.items() for x in a]
# [('A', 'text2'), ('C', 'text3'), ('D', 'text3'), ('A', 'text1'), ('C', 'text1'), ('E', 'text1'), ('F', 'text1'), ('A', 'text4'), ('B', 'text4'), ('A', 'text5')]

res = {g: [x[1] for x in items] for g, items in groupby(sorted(arr), key=lambda x: x[0])}
#{'A': ['text1', 'text2', 'text4', 'text5'], 'C': ['text1', 'text3'], 'B': ['text4'], 'E': ['text1'], 'D': ['text3'], 'F': ['text1']}

res2 = {x: len(y) for x, y in res.items()}
#{'A': 4, 'C': 2, 'B': 1, 'E': 1, 'D': 1, 'F': 1}

PS: I am hoping you'd meaningful variable names in your real code.

Upvotes: 4

Related Questions