Reputation: 2319

finding frequencies of pair items in a list of pairs

Let's say I have a long list of this type:

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'], ... ]

Given the first elements, I want to construct a dictionary that would show a count of the second elements. For example in the particular example above, I'd like to have something like this:

{'a': {'b':2, 'd':1},
 'w': {'a':1}
}

Here's how I unsuccessfully tried to solve it. I constructed a list of unique first elements. Let's call it words and then:

dic = {}

for word in words:
  inner_dic = {}
  for pair in text:
    if pair[0] == word:
      num = text.count(pair)
      inner_dic[pair[1]] = num
  dic[pair[0]] = inner_dic

I get an obviously erroneous result. One problem with the code is, it overcounts pairs. I am not sure how to solve this.

Upvotes: 4

Answers (5)

dawg

Reputation: 103744

Here is a way using the .setdefault method:

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]
result={}
for x, y in text:
    result.setdefault(x, {}).setdefault(y,0)
    result[x][y]+=1

>>> result 
{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}

No external library required.

Upvotes: 1

Padraic Cunningham

Reputation: 180391

You can use a defaultdict combined with a Counter dict:

from collections import Counter, defaultdict
d = defaultdict(Counter)

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]

for k, v in text:
    d[k][v] += 1 # for single value
   # d[k].update(v) for multiple values i.e list of words

from pprint import pprint as pp

pp(d)
{'a': Counter({'b': 2, 'd': 1}),
'w': Counter({'a': 1})}

The defaultdict will create a new key/value pairing where the value is a Counter dict if the key does not exist, if the key exists we just update the value using Counter.update which unlike dict.update will increment the value not overwrite.

using a normal dict without imports we can use dict.setdefault which will create a new dict as a value for each key k and set a default value of 0 for each subkey v:

d = {}
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]

for k, v in text:
    d.setdefault(k, {}).setdefault(v,0)
    d[k][v] += 1
pp(d)
{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}

Upvotes: 5

Raymond Hettinger

Reputation: 226221

The collections module makes short work of tasks like this.

Use a Counter for the counting part (it is a kind of dictionary that returns 0 for missing values, making it easy to use +=1 for incrementing counts). Use defaultdict for the outer dict (it can automatically make a new counter for each "first" prefix):

>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
>>> for first, second in text:
    d[first][second] += 1

Here is the equivalent using regular dictionaries:

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]

d = {}
for first, second in text:
    if first not in d:
        d[first] = {}
    inner_dict = d[first]
    if second not in inner_dict:
        inner_dict[second] = 0
    inner_dict[second] += 1

Either the short way or the long way will work perfectly with the json module (both Counter and defaultdict are kinds of dicts that can be JSON encoded).

Hope this helps. Good luck with your text analysis :-)

Upvotes: 5

itzMEonTV

Reputation: 20339

text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
d = {}
for i in text:
    if d.get(i[0]):
        if d[i[0]].get(i[1]):
            d[i[0]][i[1]] +=1
        else:
            d[i[0]][i[1]] = 1 
    else:
        d[i[0]] = {i[1] : 1}
print d
>>>{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}

Upvotes: 0

rlms

Reputation: 11060

You should do this instead:

for word in words:
  inner_dic = {}
  for pair in text:
    if pair[0] == word:
      num = text.count(pair)
      inner_dic[pair[1]] = num
  dic[word] = inner_dic

that is, you should be doing dic[word] rather than dic[pair[0]], which will assign the inner_dic to the first element in the last pair checked, even if pair[0] isn't word.

Upvotes: 5

finding frequencies of pair items in a list of pairs

Answers (5)

Related Questions