Reputation: 191
I'm reading data from a file into a series of lists as follows:
sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]
wherein the sources and topics in each list may be the same or different.
What I am trying to achieve is a dictionary which groups the topics associated with each source, and their associated scores (the scores will then be averaged, but for the purpose of this question let's just list them as values of the topic (key)).
The results would ideally look like a list of nested dicts as follows:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I think the best way to do this would be to create a Counter of the sources, and then a dict for each topics per source, and save each dict as a value for each corresponding source. However I am having trouble iterating properly to get the desired result.
Here's what I have so far:
sourceDict = {}
sourceDictList = []
for row in sourceData:
source = row[0]
score = row[1]
topic = row[2]
sourceDict = [source,{topic:score}]
sourceDictList.append(sourceDict)
sourceList.append(source)
wherein sourceDictList
results in the following: [[source, {topic: score}]...],
(essentially reformatting the data from the originally list of lists), and sourceList
is just a list of all the source (some repeating).
Then I initialize a counter and match the source from the counter with the source from sourceDictList
and if they match, save the topic:score
dict as the key:
sourceCounter = Counter(sourceList)
for key,val in sourceCounter.items():
for dictitem in sourceDictList:
if dictitem[0] == key:
sourceCounter[key] = dictitem[1]
But the output is only saving the last topic:score
dict to each source. So instead of the desired:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
I am only getting:
Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})
I am under the impression that if there is a unique key saved to a dict, it will append that key:value
pair without overwriting previous ones. Am I missing something?
Appreciate any help on this.
Upvotes: 0
Views: 117
Reputation: 4640
Simply we can do:
sourceData = [
['source1', 'topic1', 'score1'],
['source1', 'topic2', 'score1'],
['source1', 'topic1', 'score2'],
['source2', 'topic1', 'score1'],
['source2', 'topic2', 'score2'],
['source2', 'topic1', 'score3'],
]
sourceDict = {}
for row in sourceData:
source = row[0]
topic = row[1]
score = row[2]
if source not in sourceDict:
# This will be executed when the source
# comes for the first time.
sourceDict[source] = {}
if topic not in sourceDict[source]:
# This will be executed when the topic
# inside that source comes for the first time.
sourceDict[source][topic] = []
sourceDict[source][topic].append(score)
print(sourceDict)
Upvotes: 1
Reputation: 1383
You can simply use the collection's defaultdict
sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]
from collections import defaultdict
sourceDict = defaultdict(dict)
for source, topic, score in sourdata:
topicScoreDict = sourceDict[source]
topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]
>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}
Upvotes: 0