question610
question610

Reputation: 41

Creating a nested dictionary from a csv file in Python

I am reading information from a CSV file and I am using a nested dictionary to map out the repetitive information in the file. How do I go about creating a nested dictionary for this file for all rows of the file? An example of the data (not actual data but basically same concept)

State ,City/Region ,Questions ,Answers 
NY,Manhattan ,East/West Coast? ,East 
NY,Manhattan ,been there? ,yes
NY,Brooklyn ,East/West Coast? ,East 
NY,Brooklyn ,been there? ,yes
NY,Brooklyn ,Been to coney island? ,yes
NY,Queens ,East/West Coast? ,East 
NY,Queens ,been there? ,yes
NY ,Staten Island ,is island? ,yes
MA,Boston ,East/West Coast? ,East 
MA,Boston ,like it there? ,yes
MA,Pioneer Valley ,East/West Coast? ,East 
MA,Pioneer Valley ,city? ,no
MA,Pioneer Valley ,college town? ,yes
CA,Bay Area ,warm? ,yes
CA ,Bay Area ,East/West Coast? ,West 
CA ,SoCal ,north or south? ,south 
CA ,SoCal ,warm ,yes 

So essentially, the master dictionary has 3 keys: NY, MA, CA, each of them has a dictionary with City/Region as key, and each City/Region has the questions and answers.
So it would be a very nested dictionary but I can't figure out the syntax for this to do it for every row in the file.

I've tried opening the file, used a for loop to read through the lines and split the lines by ",". Something like this:

for line in my_file:
    line=line.split(",") 
    MasterDict[line[0]] = {line[1] : {} }
    MasterDict[line[0]][line[1]] = {line[2] : line[3]}

Upvotes: 1

Views: 2396

Answers (3)

Thierry Lathuille
Thierry Lathuille

Reputation: 24232

import csv
from collections import defaultdict
from functools import partial

defaultdict_of_dict = partial(defaultdict, dict)
master = defaultdict(defaultdict_of_dict)

with open("data.txt", 'r') as f:
    csv_reader = csv.reader(f)
    next(csv_reader)  # Skip the first line
    for row in csv_reader:
        state, city, question, answer = [field.strip() for field in row]
        master[state][city][question] = answer


print(master['NY']['Queens'])
# {'been there?': 'yes', 'East/West Coast?': 'East'}
print(master['NY']['Queens']['been there?'])
# yes

You can read the CSV file with the csv module that will take care of the splitting.

The example data you gave is full of unneeded spaces. In case it is the same on your real data, we sanitize it with strip.

To avoid having to create the missing keys in your dictionaries, you can use a defaultdict. It creates on-the-fly the missing keys with a default value.

For example, you could do:

from collections import defaultdict
d = defaultdict(dict)

to create a defaultdict with empty dicts as default values for missing keys, and use it like this:

d["new_key"]["subkey"] = 5
print(d)
# defaultdict(<class 'dict'>, {'new_key': {'subkey': 5}})

There's one difficulty in your case: you want a nested dictionary, so we need a defaultdict of defaultdict of dict

The parameter we give to defaultdict must be a callable, so we can't write something like defaultdict(defaultdict(dict)), as defaultdict(dict) is a defaultdict, not a function. One way to accomplish that is to use functools.partial to create a defaultdict_of_dict function, that we can pass to the main defaultdict.

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71451

You can try this slightly shorter version:

f = open(myfile).readlines()

f = [i.strip('\n').split(',') for i in f]

d = {i[0]:{i[1]:[]} for i in f[1:]}

for i in f[1:]:
    if i[1] not in d[i[0]]:
        d[i[0]][i[1]] = i[2:]
    else:
        d[i[0]][i[1]].extend(i[2:])

print d

Upvotes: 0

question610
question610

Reputation: 41

I figured out how to get it to work.

import pprint 
MasterDict={}
    my_file.readline()
    for line in my_file:
        line=line.split(",")
        if line[0] not in MasterDict:
            MasterDict[line[0]] = {}
        if line[1]:
            if line[1] not in MasterDict[line[0]]:
                MasterDict[line[0]][line[1]] = []
            MasterDict[line[0]][line[1]].append((line[2], line[3]))
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(MasterDict)

Upvotes: 0

Related Questions