Reputation: 41
I am reading information from a CSV file and I am using a nested dictionary to map out the repetitive information in the file. How do I go about creating a nested dictionary for this file for all rows of the file? An example of the data (not actual data but basically same concept)
State ,City/Region ,Questions ,Answers
NY,Manhattan ,East/West Coast? ,East
NY,Manhattan ,been there? ,yes
NY,Brooklyn ,East/West Coast? ,East
NY,Brooklyn ,been there? ,yes
NY,Brooklyn ,Been to coney island? ,yes
NY,Queens ,East/West Coast? ,East
NY,Queens ,been there? ,yes
NY ,Staten Island ,is island? ,yes
MA,Boston ,East/West Coast? ,East
MA,Boston ,like it there? ,yes
MA,Pioneer Valley ,East/West Coast? ,East
MA,Pioneer Valley ,city? ,no
MA,Pioneer Valley ,college town? ,yes
CA,Bay Area ,warm? ,yes
CA ,Bay Area ,East/West Coast? ,West
CA ,SoCal ,north or south? ,south
CA ,SoCal ,warm ,yes
So essentially, the master dictionary has 3 keys: NY, MA, CA, each of them has a dictionary with City/Region as key, and each City/Region has the questions and answers.
So it would be a very nested dictionary but I can't figure out the syntax for this to do it for every row in the file.
I've tried opening the file, used a for loop to read through the lines and split the lines by ",". Something like this:
for line in my_file:
line=line.split(",")
MasterDict[line[0]] = {line[1] : {} }
MasterDict[line[0]][line[1]] = {line[2] : line[3]}
Upvotes: 1
Views: 2396
Reputation: 24232
import csv
from collections import defaultdict
from functools import partial
defaultdict_of_dict = partial(defaultdict, dict)
master = defaultdict(defaultdict_of_dict)
with open("data.txt", 'r') as f:
csv_reader = csv.reader(f)
next(csv_reader) # Skip the first line
for row in csv_reader:
state, city, question, answer = [field.strip() for field in row]
master[state][city][question] = answer
print(master['NY']['Queens'])
# {'been there?': 'yes', 'East/West Coast?': 'East'}
print(master['NY']['Queens']['been there?'])
# yes
You can read the CSV file with the csv module that will take care of the splitting.
The example data you gave is full of unneeded spaces. In case it is the same on your real data, we sanitize it with strip
.
To avoid having to create the missing keys in your dictionaries, you can use a defaultdict. It creates on-the-fly the missing keys with a default value.
For example, you could do:
from collections import defaultdict
d = defaultdict(dict)
to create a defaultdict
with empty dicts as default values for missing keys, and use it like this:
d["new_key"]["subkey"] = 5
print(d)
# defaultdict(<class 'dict'>, {'new_key': {'subkey': 5}})
There's one difficulty in your case: you want a nested dictionary, so we need a defaultdict
of defaultdict
of dict
The parameter we give to defaultdict
must be a callable, so we can't write something like defaultdict(defaultdict(dict))
, as defaultdict(dict)
is a defaultdict
, not a function. One way to accomplish that is to use functools.partial to create a defaultdict_of_dict
function, that we can pass to the main defaultdict
.
Upvotes: 1
Reputation: 71451
You can try this slightly shorter version:
f = open(myfile).readlines()
f = [i.strip('\n').split(',') for i in f]
d = {i[0]:{i[1]:[]} for i in f[1:]}
for i in f[1:]:
if i[1] not in d[i[0]]:
d[i[0]][i[1]] = i[2:]
else:
d[i[0]][i[1]].extend(i[2:])
print d
Upvotes: 0
Reputation: 41
I figured out how to get it to work.
import pprint
MasterDict={}
my_file.readline()
for line in my_file:
line=line.split(",")
if line[0] not in MasterDict:
MasterDict[line[0]] = {}
if line[1]:
if line[1] not in MasterDict[line[0]]:
MasterDict[line[0]][line[1]] = []
MasterDict[line[0]][line[1]].append((line[2], line[3]))
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(MasterDict)
Upvotes: 0