crazyCoder
crazyCoder

Reputation: 1582

How to create a dictionary of dictionaries of dictionaries in Python

So I am taking a natural language processing class and I need to create a trigram language model to generate random text that looks "realistic" to a certain degree based off of some sample data.

Essencially need to create a "trigram" to hold the various 3 letter grammar word combinations. My professor hints that this can be done by having a dictionary of dictionaries of dictionaries which I attempted to create using:

trigram = defaultdict( defaultdict(defaultdict(int)))

However I get an error that says:

trigram = defaultdict( dict(dict(int)))
TypeError: 'type' object is not iterable

How would I do about created a 3 layer nested dictionary or a dictionary of dictionaries of dictionaries of int values?

I guess people vote down a question on stack overflow if they don't know how to answer it. I'll add some background to better explain the question for those willing to help.

This trigram is used to keep track of triple word patterns. The are used in text language processing software and almost everywhere throughout natural language processing "think siri or google now".

If we designate the 3 levels of dictionaries as dict1 dict2 and dict3 then parsing a text file and reading a statement "The boy runs" would have the following:

A dict1 which has a key of "the". Accessing that key would return dict2 which contains the key "boy". Accessing that key would return the final dict3 which would contain the key "runs" now accessing that key would return the value 1.

This symbolizes that in this text "the boy runs" has appeared 1 time. If we encounter it again then we would follow the same process and increment 1 to two. If we encounter "the girl walks" then dict2 the "the" keys dictionary will now contain another key for "girl" which would have a dict3 that has a key of "walks" and a value of 1 and so forth. Eventually after parsing a ton of text (and keeping track of the word count" you will have a trigram which can determine the likeliness of a certain starting word leading to a 3 word combination based off the frequency of times they appeared in the previously parsed text.

This can help you create grammar rules to identify languages or in my case created randomly generated text that looks very much like grammatical english. I need a three layer dictionary because at any position of a 3 word combination there can be another word that can create a whole different set of combinations. I TRIED my best to explain trigrams and the purpose behind them to the best of my ability... granted I just stated the class a couple weeks ago.

Now... with ALL of that being said. How would I go about creating a dictionary of dictionaries of dictionaries whose base dictionary holds values of type int in python?

trigram = defaultdict( defaultdict(defaultdict(int)))

throws an error for me

Upvotes: 7

Views: 9311

Answers (4)

pcurry
pcurry

Reputation: 1414

The defaultdict __init__ method takes an argument that is required to be a callable. The callable passed to defaultdict must be callable with no arguments, and must return an instance of the default value.

The problem with nesting defaultdict as you did was that defaultdict's __init__ takes an argument. Giving defaultdict that argument means that rather than the wrapping defaultdict having a callable as its __init__ argument, it has an instance of defaultdict, which is not callable.

The lambda solution by @pcoving will work, because it creates an anonymous function which returns a defaultdict initialized with a function that returns the correct type defaultdict for each layer in the dictionary nesting.

Upvotes: 1

pcoving
pcoving

Reputation: 2788

I've tried nested defaultdict's before and the solution seems to be a lambda call:

trigram = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))

trigram['a']['b']['c'] += 1

It's not pretty, but I suspect the nested dictionary suggestion is for efficient lookup.

Upvotes: 13

alvas
alvas

Reputation: 122042

If it's just extracting and retrieving trigrams, you should try this with NLTK:

>>> import nltk
>>> sent = "this is a foo bar crazycoder"
>>> trigrams = nltk.ngrams(sent.split(), 3)
[('this', 'is', 'a'), ('is', 'a', 'foo'), ('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')]
# token "a" in first element of trigram
>>> first_a = [i for i in trigrams if i[0] == "a"]
[('a', 'foo', 'bar')]
# token "a" in 2nd element of trigram
>>> second_a = [i for i in trigrams if i[1] == "a"]
[('is', 'a', 'foo')]
# token "a" in third element of trigram
>>> third = [i for i in trigrams if i[2] == "a"]
[('this', 'is', 'a')]
# look for 2gram in trigrams
>> two_foobar = [i for i in trigrams if "foo" in i and "bar" in i]
[('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')]
# look for a perfect 3gram
>> perfect = [i fof i in trigrams if "foo bar crazycoder".split() == i]
[('foo', 'bar', 'crazycoder')]

Upvotes: 0

Abhijit
Abhijit

Reputation: 63727

Generally to create a nested dictionary of trigrams the already posted solutions might work. If you would like to extend the idea for a more generalized solution, you can do one of the following, one of which is adopted from Perl's AutoVivification and the other using collection.defaultdict.

Solution 1:

class ngram(dict):
    """Based on perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return super(ngram, self).__getitem__(item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Solution 2:

from collections import defaultdict
class ngram(defaultdict):
    def __init__(self):
        super(ngram, self).__init__(ngram)

Demo using Solution 1

>>> trigram = ngram()
>>> trigram['two']['three']['four'] = 4
>>> trigram
{'two': {'three': {'four': 4}}}
>>> a['two']
{'three': {'four': 4}}
>>> a['two']['three']
{'four': 4}
>>> a['two']['three']['four']
4

Demo using Solution 2

>>> a = ngram()
>>> a['two']['three']['four'] = 4
>>> a
defaultdict(<class '__main__.ngram'>, {'two': defaultdict(<class '__main__.ngram'>, {'three': defaultdict(<class '__main__.ngram'>, {'four': 4})})})

Upvotes: 6

Related Questions