Elomack
Elomack

Reputation: 1

Nested dictionary issue

I need to create a program that takes a CSV file and returns a nested dictionary. The keys for the outer dictionary should be the first value in each row, starting from the second one (so as to omit the row with the column names). The value for each key in the outer dictionary should be another dictionary, which I explain below.

The inner dictionary's keys should be the column names, while the values should be the value corresponding to that column in each row.

Example:

For a CSV file like this:

column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78

I would like to print out the data in this form:

my_dict = {
'4': {'column1':'4','column2':'12', 'column3':'5', 'column4':'11'},
'29': {'column1':'29', 'column2':'47', 'column3':'23', 'column4':'41'},
'66': {'column1':'66', 'column2':'1', 'column3':'98', 'column4':'78'}
}

The closest I've gotten so far (which isn't even close):

import csv
import collections

def csv_to_dict(file, delimiter, quotechar):

list_inside_dict = collections.defaultdict(list)
with open(file, newline = '') as csvfile:
    reader = csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar)
    for row in reader:
        for (k,v) in row.items(): 
            list_inside_dict[k].append(v)
return dict(list_inside_dict)

If I try to run the function with the example CSV file above, delimiter = ",", and quotechar = "'", it returns the following:

{'column1': ['4', '29', '66'], ' column2': ['12', '47', '1'], ' column3': ['5', '23', '98'], ' column4': ['11', '41', '78']}

At this point I got lost. I tried to change:

list_inside_dict = collections.defaultdict(list)

for

list_inside_dict = collections.defaultdict(dict)

And then simply changing the value for each key, since I cannot append into a dictionary, but it all got really messy. So I started from scratch and found I reached the same place.

Upvotes: 0

Views: 319

Answers (4)

dawg
dawg

Reputation: 104111

It is a couple of zips to get what you want.

Instead of a file, we can use a string for the csv. Just replace that part with a file.

Given:

s='''\
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78'''

You can do:

import csv 

data=[]
for row in csv.reader(s.splitlines()):  # replace 'splitlines' with your file
    data.append(row)

header=data.pop(0)
col1=[e[0] for e in data]
di={}
for c,row in zip(col1,data):
    di[c]=dict(zip(header, row))

Then:

>>> di
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
 '29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
 '66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

On Python 3.6+, the dicts will maintain insertion order. Earlier Python's will not.

Upvotes: 0

Ben Botvinick
Ben Botvinick

Reputation: 3345

This is similar to this answer, however, I believe it could be better explained.

import csv

with open('filename.csv') as f:
    headers, *data = csv.reader(f)
    output = {}
    for firstInRow, *restOfRow in data:
        output[firstInRow] = dict(zip(headers, [firstInRow, *restOfRow]))
    print(output)

What this does is loops through the rows of data in the file with the first value as the index and the following values in a list. The value of the index in the output dictionary is then set by zipping the list of headers and the list of values. That output[first] = ... line is the same as writing output[firstInRow] = {header[1]: firstInRow, header[2]: restOfRow[1], ...}.

Output:

{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

Upvotes: 0

a_guest
a_guest

Reputation: 36339

You can use pandas for that task.

>>> df = pd.read_csv('/path/to/file.csv')
>>> df.index = df.iloc[:, 0]
>>> df.to_dict('index')

Not sure why you want to duplicate the value of the first column, but in case you don't the above simplifies to:

>>> pd.read_csv('/path/to/file.csv', index_col=0).to_dict('index')

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71471

You can use a dictionary comprehension:

import csv
with open('filename.csv') as f:
  header, *data = csv.reader(f)
  final_dict = {a:dict(zip(header, [a, *b])) for a, *b in data}

Output:

{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'}, 
 '29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'}, 
 '66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}

Upvotes: 1

Related Questions