Nested dictionary - iteration off

Question

I am trying to convert some data into a nested dictionary, where there is a general State key and then specific Area keys that match to numbers (as seen below).

To get the data into this better form, I have a painful triple loop as seen below. However, it is not working properly because it is only saving the last number for every state/area. I have tried to adjust this code but do not see where to fix my iteration to make it properly loop through each iterator and place the growth numbers aaccordingly. The output that I am trying to get vs. the actual output is below - while there is no error message, the output is not what I am aiming for.

Thank you for your help on this, and curious if this is a wildly inefficient (or okay) way to iterate when there are dictionaries inside dictionaries.

state=['NJ', 'NY', 'TX', 'CA', 'OH']
area=['North','South']
growth = [['State', 'North', 'South'],
    ['NJ', '5.09', '3'],
    ['NY', '0', '1',],
    ['TX', '8','5.54'],
    ['CA', '6', '1'],
    ['OH', '7.77', '5']]

nested_dict={}
for i in range(1,len(growth)):
    nested_dict[growth[i][0]]=dict()

for i in range(0,len(state)):
    for j in range(0,len(area)):
        for k in range(1,len(growth[0])):
            nested_dict[state[i]][area[j]]=float(growth[i+1][k])

Expected Output:

{'CA': {'North': 6.0, 'South': 1.0},
 'NJ': {'North': 5.09, 'South': 3.0},
 'NY': {'North': 0.0, 'South': 1.0},
 'OH': {'North': 7.77, 'South': 5.0},
 'TX': {'North': 8.0, 'South': 5.54}}

Wrong Output:

{'CA': {'North': 1.0, 'South': 1.0},
 'NJ': {'North': 3.0, 'South': 3.0},
 'NY': {'North': 1.0, 'South': 1.0},
 'OH': {'North': 5.0, 'South': 5.0},
 'TX': {'North': 5.54, 'South': 5.54}}

Edits above to reflect comments

Martijn Pieters · Accepted Answer

You are assigning both the 2nd and 3rd values from your growth rows to all state and area pairings, with your innermost loop:

for k in range(1,len(growth[0])):

So you end up assigning twice:

# first iteration
i = 0   # state: NJ
j = 0   # area: North
k = 1   # second column in growth
nested_dict[state[0]][area[0]] = float(growth[0+1][1])
# nested_dict['NJ']['North'] = float(['NJ', '5.09', '3'][1] == '5.09')

# second iteration
i = 0   # state: NJ
j = 0   # area: North
k = 2   # third column in growth
nested_dict[state[0]][area[0]] = float(growth[0+1][2])
# nested_dict['NJ']['North'] = float(['NJ', '5.09', '3'][2] == '3')

Note how i and j have not changed!

You don't need that 3rd loop at all, you already have picked your state and area; just pick out the right value from the table with that, using the area index, plus 1 to map to the right column:

for i in range(len(state)):
    for j in range(len(area)):
        nested_dict[state[i]][area[j]] = float(growth[i + 1][j + 1])
        #                                       instead of k ^^^^^

Now, using indices single-letter indices makes it hard to follow your code. You should really learn about the enumerate() function here, to generate indices:

for row, state_name in enumerate(state, 1):   # starting at row 1
    for col, area_name in enumerate(area, 1): # starting at column 1
        nested_dict[state_name][area_name] = float(growth[row][col])

By looping directly over state, and using enumerate() to add an index starting at 1, you get both the state name ('NJ') and the right index into growth (1). The same happens for the area list (so 'North' and 1, or 'South' and 2).

However, you already have all your row and column names in the growth matrix, so you can just generate all your dictionaries directly with that:

area_names = growth[0][1:]
nested_dict = {
    state: {area: float(value) for area, value in zip(area_names, row)}
    for state, *row in growth[1:]
}

The zip() function puts the area names (found on the first row of growth), together with the values from each column (stored in row, with the first value diverted to state).

Note the for state, *row in .. loop; Python unpacks each list from growth into two variables; the first value is stored in state, and because row is prefixed with *, all remaining values are stored in row:

>>> state, *row = ['NJ', '5.09', '3']
>>> state
'NJ'
>>> row
['5.09', '3']

That's really all you need:

>>> area_names = growth[0][1:]
>>> nested_dict = {
...     state: {area: float(value) for area, value in zip(area_names, row)}
...     for state, *row in growth[1:]
... }
{'NJ': {'North': 5.09, 'South': 3.0}, 'NY': {'North': 0.0, 'South': 1.0}, 'TX': {'North': 8.0, 'South': 5.54}, 'CA': {'North': 6.0, 'South': 1.0}, 'OH': {'North': 7.77, 'South': 5.0}}
>>> from pprint import pprint
>>> pprint(_)
{'CA': {'North': 6.0, 'South': 1.0},
 'NJ': {'North': 5.09, 'South': 3.0},
 'NY': {'North': 0.0, 'South': 1.0},
 'OH': {'North': 7.77, 'South': 5.0},
 'TX': {'North': 8.0, 'South': 5.54}}

Last but not least, if all this data came from a CSV file, perhaps you should use the csv module instead:

import csv

nested_dict = {}
with open(somefile, 'r', newline='') as f:
    reader = csv.DictReader(f)
    for row in reader:
        state = row.pop('State')
        nested_dict[state] = {k: float(v) for k, v in row.items()}

A DictReader() object takes the first row of a CSV file as the column names, and produces a dictionary for each row. If your state column uses quotes around the names, you can even use quoting=csv.QUOTE_NONNUMERIC to have the module automatically convert anything that is not quoted into a float() value for you.

Nested dictionary - iteration off

Answers (2)

Related Questions