How to iterate through nested dicts (counters) and update keys recursively

Question

I'm reading data from a file into a series of lists as follows:

sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]

wherein the sources and topics in each list may be the same or different.

What I am trying to achieve is a dictionary which groups the topics associated with each source, and their associated scores (the scores will then be averaged, but for the purpose of this question let's just list them as values of the topic (key)).

The results would ideally look like a list of nested dicts as follows:

[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]

I think the best way to do this would be to create a Counter of the sources, and then a dict for each topics per source, and save each dict as a value for each corresponding source. However I am having trouble iterating properly to get the desired result.

Here's what I have so far:

sourceDict = {} 
sourceDictList = []

for row in sourceData:
    source = row[0]
    score = row[1]
    topic = row[2]
    sourceDict = [source,{topic:score}]
    sourceDictList.append(sourceDict)
    sourceList.append(source)

wherein sourceDictList results in the following: [[source, {topic: score}]...], (essentially reformatting the data from the originally list of lists), and sourceList is just a list of all the source (some repeating).

Then I initialize a counter and match the source from the counter with the source from sourceDictList and if they match, save the topic:score dict as the key:

sourceCounter = Counter(sourceList)


for key,val in sourceCounter.items():
    for dictitem in sourceDictList:
        if dictitem[0] == key:
            sourceCounter[key] = dictitem[1]

But the output is only saving the last topic:score dict to each source. So instead of the desired:

[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]

I am only getting:

Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})

I am under the impression that if there is a unique key saved to a dict, it will append that key:value pair without overwriting previous ones. Am I missing something?

Appreciate any help on this.

Dipen Dadhaniya · Accepted Answer

Simply we can do:

sourceData = [
    ['source1', 'topic1', 'score1'],
    ['source1', 'topic2', 'score1'],
    ['source1', 'topic1', 'score2'],

    ['source2', 'topic1', 'score1'],
    ['source2', 'topic2', 'score2'],
    ['source2', 'topic1', 'score3'],
]

sourceDict = {}

for row in sourceData:
    source = row[0]
    topic = row[1]
    score = row[2]

    if source not in sourceDict:
        # This will be executed when the source
        # comes for the first time.
        sourceDict[source] = {}

    if topic not in sourceDict[source]:
        # This will be executed when the topic
        # inside that source comes for the first time.
        sourceDict[source][topic] = []

    sourceDict[source][topic].append(score)

print(sourceDict)

How to iterate through nested dicts (counters) and update keys recursively

Answers (2)

Related Questions