bobman
bobman

Reputation: 123

Creating a complex nested dictionary from multiple lists in Python

I am struggling to create a nested dictionary with the following data:

Team,       Group,  ID,  Score,  Difficulty
OneTeam,    A,      0,   0.25,   4
TwoTeam,    A,      1,   1,      10
ThreeTeam,  A,      2,   0.64,   5
FourTeam,   A,      3,   0.93,   6
FiveTeam,   B,      4,   0.5,    7
SixTeam,    B,      5,   0.3,    8
SevenTeam,  B,      6,   0.23,   9
EightTeam,  B,      7,   1.2,    4

Once imported as a Pandas Dataframe, I turn each feature into these lists: teams, group, id, score, diff.

Using this stack overflow answer Create a complex dictionary using multiple lists I can create the following dictionary:

{'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
 'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
 'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
 'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25},
 'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
 'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
 'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
 'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}

using the code:

{team: {'id': i, 'score': s, 'diff': d} for team, i, s, d in zip(teams, id, score, diff)}

But what I'm after is having 'Group' as the main key, then team, and then id, score and difficulty within the team (as above).

I have tried:

{g: {team: {'id': i, 'score': s, 'diff': d}} for g, team, i, s, d in zip(group, teams, id, score, diff)}

but this doesn't work and results in only one team per group within the dictionary:

{'A': {'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93}},
 'B': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2}}}

Below is how the dictionary should look, but I'm not sure how to get there - any help would be much appreciated!

{'A:': {'EightTeam': {'diff': 4, 'id': 7, 'score': 1.2},
  'FiveTeam': {'diff': 7, 'id': 4, 'score': 0.5},
  'FourTeam': {'diff': 6, 'id': 3, 'score': 0.93},
  'OneTeam': {'diff': 4, 'id': 0, 'score': 0.25}},
 'B': {'SevenTeam': {'diff': 9, 'id': 6, 'score': 0.23},
  'SixTeam': {'diff': 8, 'id': 5, 'score': 0.3},
  'ThreeTeam': {'diff': 5, 'id': 2, 'score': 0.64},
  'TwoTeam': {'diff': 10, 'id': 1, 'score': 1.0}}}

Upvotes: 2

Views: 787

Answers (2)

JohnO
JohnO

Reputation: 777

A dict comprehension may not be the best way of solving this if your data is stored in a table like this.

Try something like

from collections import defaultdict
groups = defaultdict(dict)
for g, team, i, s, d in zip(group, teams, id, score, diff):
    groups[g][team] = {'id': i, 'score': s, 'diff': d }

By using defaultdict, if groups[g] already exists, the new team is added as a key, if it doesn't, an empty dict is automatically created that the new team is then inserted into.

Edit: you edited your answer to say that your data is in a pandas dataframe. You can definitely skip the steps of turning the columns into list. Instead you could then for example do:

from collections import defaultdict
groups = defaultdict(dict)
for row in df.itertuples():
    groups[row.Group][row.Team] = {'id': row.ID, 'score': row.Score, 'diff': row.Difficulty} 

Upvotes: 3

bracco23
bracco23

Reputation: 2211

If you absolutely want to use comprehension, then this should work:

z = zip(teams, group, id, score, diff)
s = set(group)
d = { #outer dict, one entry for each different group
    group: ({ #inner dict, one entry for team, filtered for group
        team: {'id': i, 'score': s, 'diff': d} 
        for team, g, i, s, d in z
        if g == group
        }) 
    for group in s 
    }

I added linebreaks for clarity

EDIT:

After the comment, to better clarify my intention and out of curiosity, I run a comparison:

# your code goes here

from collections import defaultdict
import timeit

teams = ['OneTeam', 'TwoTeam', 'ThreeTeam', 'FourTeam', 'FiveTeam', 'SixTeam', 'SevenTeam', 'EightTeam']
group = ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']
id = [0, 1, 2, 3, 4, 5, 6, 7]
score = [0.25, 1, 0.64, 0.93, 0.5, 0.3, 0.23, 1.2] 
diff = [4, 10, 5, 6, 7, 8, 9, 4]

def no_comprehension():
    global group, teams, id, score, diff
    groups = defaultdict(dict)
    for g, team, i, s, d in zip(group, teams, id, score, diff):
        groups[g][team] = {'id': i, 'score': s, 'diff': d }

def comprehension():
    global group, teams, id, score, diff
    z = zip(teams, group, id, score, diff)
    s = set(group)
    d = {group: ({team: {'id': i, 'score': s, 'diff': d} for team, g, i, s, d in z if g == group}) for group in s}

print("no comprehension:")
print(timeit.timeit(lambda : no_comprehension(), number=10000))
print("comprehension:")
print(timeit.timeit(lambda : comprehension(), number=10000))

executable version

Output:

no comprehension:
0.027287796139717102
comprehension:
0.028979241847991943

They do look the same, in terms of performance. With my sentence above, I was just highlighting this as an alternative solution to the one already posted by @JohnO.

Upvotes: 2

Related Questions