Reputation: 2319
Let's say I have a long list of this type:
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'], ... ]
Given the first elements, I want to construct a dictionary that would show a count of the second elements. For example in the particular example above, I'd like to have something like this:
{'a': {'b':2, 'd':1},
'w': {'a':1}
}
Here's how I unsuccessfully tried to solve it. I constructed a list of unique first elements. Let's call it words
and then:
dic = {}
for word in words:
inner_dic = {}
for pair in text:
if pair[0] == word:
num = text.count(pair)
inner_dic[pair[1]] = num
dic[pair[0]] = inner_dic
I get an obviously erroneous result. One problem with the code is, it overcounts pairs. I am not sure how to solve this.
Upvotes: 4
Views: 1744
Reputation: 103744
Here is a way using the .setdefault method:
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]
result={}
for x, y in text:
result.setdefault(x, {}).setdefault(y,0)
result[x][y]+=1
>>> result
{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}
No external library required.
Upvotes: 1
Reputation: 180391
You can use a defaultdict combined with a Counter dict:
from collections import Counter, defaultdict
d = defaultdict(Counter)
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]
for k, v in text:
d[k][v] += 1 # for single value
# d[k].update(v) for multiple values i.e list of words
from pprint import pprint as pp
pp(d)
{'a': Counter({'b': 2, 'd': 1}),
'w': Counter({'a': 1})}
The defaultdict will create a new key/value pairing where the value is a Counter dict if the key does not exist, if the key exists we just update the value using Counter.update which unlike dict.update will increment the value not overwrite.
using a normal dict
without imports we can use dict.setdefault which will create a new dict as a value for each key k
and set a default value of 0
for each subkey v
:
d = {}
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b'] ]
for k, v in text:
d.setdefault(k, {}).setdefault(v,0)
d[k][v] += 1
pp(d)
{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}
Upvotes: 5
Reputation: 226221
The collections module makes short work of tasks like this.
Use a Counter for the counting part (it is a kind of dictionary that returns 0 for missing values, making it easy to use +=1
for incrementing counts). Use defaultdict for the outer dict (it can automatically make a new counter for each "first" prefix):
>>> from collections import defaultdict, Counter
>>> d = defaultdict(Counter)
>>> text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
>>> for first, second in text:
d[first][second] += 1
Here is the equivalent using regular dictionaries:
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
d = {}
for first, second in text:
if first not in d:
d[first] = {}
inner_dict = d[first]
if second not in inner_dict:
inner_dict[second] = 0
inner_dict[second] += 1
Either the short way or the long way will work perfectly with the json module (both Counter and defaultdict are kinds of dicts that can be JSON encoded).
Hope this helps. Good luck with your text analysis :-)
Upvotes: 5
Reputation: 20339
text = [ ['a', 'b'], ['a', 'd'], ['w', 'a'], ['a', 'b']]
d = {}
for i in text:
if d.get(i[0]):
if d[i[0]].get(i[1]):
d[i[0]][i[1]] +=1
else:
d[i[0]][i[1]] = 1
else:
d[i[0]] = {i[1] : 1}
print d
>>>{'a': {'b': 2, 'd': 1}, 'w': {'a': 1}}
Upvotes: 0
Reputation: 11060
You should do this instead:
for word in words:
inner_dic = {}
for pair in text:
if pair[0] == word:
num = text.count(pair)
inner_dic[pair[1]] = num
dic[word] = inner_dic
that is, you should be doing dic[word]
rather than dic[pair[0]]
, which will assign the inner_dic
to the first element in the last pair
checked, even if pair[0]
isn't word
.
Upvotes: 5