Rose_Trojan
Rose_Trojan

Reputation: 117

Iteratively populate a nested dictionary in Python

I am working on dynamically populating a nested dictionary with data from MongoDB. I am not that well-versed in using dictionaries, so please bear with me. I have checked over and over and tried different approaches, but I still keep on getting the same incorrect result.

The data I am trying to feed into the dictionary is not in a tuple, as I have seen in the questions I have checked, but in a collection from MongoDB.

This is what my collection fields look like:

new_crawl_130422_data.insert_one(
        {
        "database_url": proj_database_url,
        "database_project_id": proj_database_id,
        "projectname": proj_database_name,
        "version": version,
        "boost": boost,
        "content": content,
        "digest": digest,
        "title": title,
        "timestamp": timestamp,
        "url": website,
        "language": language
        }

The language field there can be various languages, for a particular project_id. So in essence, I have a number of records per project_id, and some of them are in different languages. What I am trying to do is create a nested dictionary with the project_id as the name, and the keys being the different languages. So I should have something like:

{Project_id1: {'it': "text here in Italian if it exists in the collection" ,'en': "text here in English if it exits", 'de': "text here in German if it exists"}
{Project_id2: {'en': "text here in English if it exists in the collection" ,'fr': "text here in French if it exits", 'de': "text here in German if it exists"}

etc.

Hence, as it iterates through the records, it should pick a language and make that the key, and pick the 'content' as the value. Another aspect is that if there is already that language key in the dictionary, it should append the text with the matching language to the value. I don't know if this is too much for a dictionary?

So far, I have tried the following feeble attempts, and have gotten the same result, which is only the last record and language read (it's overwriting, not appending) and also, it's not concatenating the texts.

project_details = {}

for row in results:
    idProject = row[0]
    documents = mongo_db.new_collection_Eus.find(
       {"database_project_id": idProject},
       no_cursor_timeout=True).batch_size(100)

    for doc in documents:
        project_details[doc['database_project_id']] = {}

        [project_details[doc['database_project_id']][doc['language']]] = [doc['content']]

        for k,v in project_details[doc['database_project_id']].items():
            if k in [project_details[doc['database_project_id']]]:
                k[v] = project_details[doc['database_project_id']][doc['language']].append([doc['content']])

            else:
                [project_details[doc['database_project_id']][doc['language']]] = [doc['content']]

also tried this:

for row in results:
    idProject = row[0]
    documents = mongo_db.new_collection_Eus.find(
       {"database_project_id": idProject},
       no_cursor_timeout=True).batch_size(100)

    for doc in documents:
        project_details[doc['database_project_id']] = {}

        if doc['language'] not in project_details[doc['database_project_id']].keys():

            project_details[doc['database_project_id']][doc['language']] = doc['content']
        else:
            
            project_details[doc['database_project_id']][doc['language']] = project_details[doc['database_project_id']][doc['language']] + ' ' + doc['content']
            

They both give the same result, only one language, even though there are many languages in the records, and the text is not concatenated per language in the dictionary.

I have looked through these questions

Any help will be greatly appreciated, as I'm quite stuck on this.

Upvotes: 1

Views: 287

Answers (1)

Pierre D
Pierre D

Reputation: 26211

I think that's a good job for defaultdict:

# simple setup for example
test = [
    (12, 'it', 'Buongiorno'),
    (12, 'it', 'tutti'),
    (12, 'fr', 'Salut'),
    (12, 'fr', 'tout le monde'),
    (12, 'en', 'Hello'),
    (12, 'en', 'world'),
    (13, 'en', 'and now'),
    (13, 'en', 'for something completely different'),
]

# shuffle into a nested default dict: d[proj_id][lang]: list
from collections import defaultdict

d = defaultdict(lambda: defaultdict(list))
for proj_id, lang, text in test:
    d[proj_id][lang].append(text)

>>> d
defaultdict(<function __main__.<lambda>()>,
            {12: defaultdict(list,
                         {'it': ['Buongiorno', 'tutti'],
                          'fr': ['Salut', 'tout le monde'],
                          'en': ['Hello', 'world']}),
             13: defaultdict(list,
                         {'en': ['and now',
                           'for something completely different']})})
>>> list(d[12])
['it', 'fr', 'en']

>>> d[12]['fr']
['Salut', 'tout le monde']

Addendum: turn into a simple dict and join multipart content

To transform d above into a simple dict, and at the same time join any multipart content into a single string (with sep as separator):

sep = ' '
d2 = {
    proj_id: {
        lang: sep.join(parts) for lang, parts in proj.items()
    } for proj_id, proj in d.items()
}

>>> d2
{12: {'it': 'Buongiorno tutti',
  'fr': 'Salut tout le monde',
  'en': 'Hello world'},
 13: {'en': 'and now for something completely different'}}
```

Upvotes: 1

Related Questions