Reputation: 1
I need to create a program that takes a CSV file and returns a nested dictionary. The keys for the outer dictionary should be the first value in each row, starting from the second one (so as to omit the row with the column names). The value for each key in the outer dictionary should be another dictionary, which I explain below.
The inner dictionary's keys should be the column names, while the values should be the value corresponding to that column in each row.
Example:
For a CSV file like this:
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78
I would like to print out the data in this form:
my_dict = {
'4': {'column1':'4','column2':'12', 'column3':'5', 'column4':'11'},
'29': {'column1':'29', 'column2':'47', 'column3':'23', 'column4':'41'},
'66': {'column1':'66', 'column2':'1', 'column3':'98', 'column4':'78'}
}
The closest I've gotten so far (which isn't even close):
import csv
import collections
def csv_to_dict(file, delimiter, quotechar):
list_inside_dict = collections.defaultdict(list)
with open(file, newline = '') as csvfile:
reader = csv.DictReader(csvfile, delimiter=delimiter, quotechar=quotechar)
for row in reader:
for (k,v) in row.items():
list_inside_dict[k].append(v)
return dict(list_inside_dict)
If I try to run the function with the example CSV file above, delimiter = ","
, and quotechar = "'"
, it returns the following:
{'column1': ['4', '29', '66'], ' column2': ['12', '47', '1'], ' column3': ['5', '23', '98'], ' column4': ['11', '41', '78']}
At this point I got lost. I tried to change:
list_inside_dict = collections.defaultdict(list)
for
list_inside_dict = collections.defaultdict(dict)
And then simply changing the value for each key, since I cannot append into a dictionary, but it all got really messy. So I started from scratch and found I reached the same place.
Upvotes: 0
Views: 319
Reputation: 104111
It is a couple of zips
to get what you want.
Instead of a file, we can use a string for the csv. Just replace that part with a file.
Given:
s='''\
column1, column2, column3, column4
4,12,5,11
29,47,23,41
66,1,98,78'''
You can do:
import csv
data=[]
for row in csv.reader(s.splitlines()): # replace 'splitlines' with your file
data.append(row)
header=data.pop(0)
col1=[e[0] for e in data]
di={}
for c,row in zip(col1,data):
di[c]=dict(zip(header, row))
Then:
>>> di
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
On Python 3.6+, the dicts will maintain insertion order. Earlier Python's will not.
Upvotes: 0
Reputation: 3345
This is similar to this answer, however, I believe it could be better explained.
import csv
with open('filename.csv') as f:
headers, *data = csv.reader(f)
output = {}
for firstInRow, *restOfRow in data:
output[firstInRow] = dict(zip(headers, [firstInRow, *restOfRow]))
print(output)
What this does is loops through the rows of data in the file with the first value as the index and the following values in a list. The value of the index in the output dictionary is then set by zipping the list of headers and the list of values. That output[first] = ...
line is the same as writing output[firstInRow] = {header[1]: firstInRow, header[2]: restOfRow[1], ...}
.
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
Upvotes: 0
Reputation: 36339
You can use pandas
for that task.
>>> df = pd.read_csv('/path/to/file.csv')
>>> df.index = df.iloc[:, 0]
>>> df.to_dict('index')
Not sure why you want to duplicate the value of the first column, but in case you don't the above simplifies to:
>>> pd.read_csv('/path/to/file.csv', index_col=0).to_dict('index')
Upvotes: 1
Reputation: 71471
You can use a dictionary comprehension:
import csv
with open('filename.csv') as f:
header, *data = csv.reader(f)
final_dict = {a:dict(zip(header, [a, *b])) for a, *b in data}
Output:
{'4': {'column1': '4', ' column2': '12', ' column3': '5', ' column4': '11'},
'29': {'column1': '29', ' column2': '47', ' column3': '23', ' column4': '41'},
'66': {'column1': '66', ' column2': '1', ' column3': '98', ' column4': '78'}}
Upvotes: 1