Kernel Panic
Kernel Panic

Reputation: 55

Using Python CSV DictReader to create multi-level nested dictionary

Total Python noob here, probably missing something obvious. I've searched everywhere and haven't found a solution yet, so I thought I'd ask for some help.

I'm trying to write a function that will build a nested dictionary from a large csv file. The input file is in the following format:

Product,Price,Cost,Brand,
blue widget,5,4,sony,
red widget,6,5,sony,
green widget,7,5,microsoft,
purple widget,7,6,microsoft,

etc...

The output dictionary I need would look like:

projects = { `<Brand>`: { `<Product>`: { 'Price': `<Price>`, 'Cost': `<Cost>` },},}

But obviously with many different brands containing different products. In the input file, the data is ordered alphabetically by brand name, but I know that it becomes unordered as soon as DictReader executes, so I definitely need a better way to handle the duplicates. The if statement as written is redundant and unnecessary.

Here's the non-working, useless code I have so far:

def build_dict(source_file):
  projects = {}
  headers = ['Product', 'Price', 'Cost', 'Brand']
  reader = csv.DictReader(open(source_file), fieldnames = headers, dialect = 'excel')
  current_brand = 'None'
  for row in reader:
    if Brand != current_brand:
      current_brand = Brand
    projects[Brand] = {Product: {'Price': Price, 'Cost': Cost}}
  return projects

source_file = 'merged.csv'
print build_dict(source_file)

I have of course imported the csv module at the top of the file.

What's the best way to do this? I feel like I'm way off course, but there is very little information available about creating nested dicts from a CSV, and the examples that are out there are highly specific and tend not to go into detail about why the solution actually works, so as someone new to Python, it's a little hard to draw conclusions.

Also, the input csv file doesn't normally have headers, but for the sake of trying to get a working version of this function, I manually inserted a header row. Ideally, there would be some code that assigns the headers.

Any help/direction/recommendation is much appreciated, thanks!

Upvotes: 3

Views: 7408

Answers (2)

LancelotHolmes
LancelotHolmes

Reputation: 699

Here I offer another way to satisfy your requirement(different from DSM) Firstly, this is my code:

import csv

new_dict={}
with open('merged.csv','rb')as csv_file:
    data=csv.DictReader(csv_file,delimiter=",")
    for row in data:
        dict_brand=new_dict.get(row['Brand'],dict())
        dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}
        new_dict[row['Brand']]=dict_brand
print new_dict

Briefly speaking, the main point to solve is to figure out what the key-value pairs are in your requirements. According to your requirement,it can be called as a 3-level-dict,here the key of first level is the value of Brand int the original dictionary, so I extract it from the original csv file as

dict_brand=new_dict.get(row['Brand'],dict())

which is going to judge if there exists the Brand value same as the original dict in our new dict, if yes, it just inserts, if no, it creates, then maybe the most complicated part is the second level or middle level, here you set the value of Product of original dict as the value of the new dict of key Brand, and the value of Product is also the key of the the third level dict which has Price and Cost of the original dict as the value,and here I extract them like:

dict_brand[row['Product']]={k:row[k] for k in ('Cost','Price')}

and finally, what we need to do is just set the created 'middle dict' as the value of our new dict which has Brand as the key. Finally, the output is

{'sony': {'blue widget': {'Price': '5', 'Cost': '4'}, 
'red widget': {'Price': '6', 'Cost': '5'}}, 
'microsoft': {'purple widget': {'Price': '7', 'Cost': '6'}, 
'green widget': {'Price': '7', 'Cost': '5'}}}

That's that.

Upvotes: 1

DSM
DSM

Reputation: 353499

import csv
from collections import defaultdict

def build_dict(source_file):
    projects = defaultdict(dict)
    headers = ['Product', 'Price', 'Cost', 'Brand']
    with open(source_file, 'rb') as fp:
        reader = csv.DictReader(fp, fieldnames=headers, dialect='excel',
                                skipinitialspace=True)
        for rowdict in reader:
            if None in rowdict:
                del rowdict[None]
            brand = rowdict.pop("Brand")
            product = rowdict.pop("Product")
            projects[brand][product] = rowdict
    return dict(projects)

source_file = 'merged.csv'
print build_dict(source_file)

produces

{'microsoft': {'green widget': {'Cost': '5', 'Price': '7'},
               'purple widget': {'Cost': '6', 'Price': '7'}},
 'sony': {'blue widget': {'Cost': '4', 'Price': '5'},
          'red widget': {'Cost': '5', 'Price': '6'}}}

from your input data (where merged.csv doesn't have the headers, only the data.)

I used a defaultdict here, which is just like a dictionary but when you refer to a key that doesn't exist instead of raising an Exception it simply makes a default value, in this case a dict. Then I get out -- and remove -- Brand and Product, and store the remainder.

All that's left I think would be to turn the cost and price into numbers instead of strings.

[modified to use DictReader directly rather than reader]

Upvotes: 6

Related Questions