Reputation: 71

Python - Dictionary from CSV file with Multiple Values per Key

I am trying to make a dictionary from a csv file in python. Let's say the CSV contains:

Student   food      amount
John      apple       15
John      banana      20
John      orange      1
John      grape       3
Ben       apple       2
Ben       orange      4
Ben       strawberry  8
Andrew    apple       10
Andrew    watermelon  3

what i'm envisioning is a dictionary whose key will be the student name and a list as the value where each entry corresponds to a different food. I would have to count the number of unique food items in the second column and that would be the length of the vector. For example:

The value of [15,20,1,3,0,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for  'John'. 
The value of [2,0,4,0,8,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Ben'.
The value of [10,0,0,0,0,3] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Andrew'

The expected output of the dict would look like this:

dict={'John':{[15,20,1,3,0,0]}, 'Ben': {[2,0,4,0,8,0]}, 'Andrew': {[10,0,0,0,0,3]}}

I'm having trouble creating the dictionary to begin with or if a dictionary is even the right approach. What I have to begin with:

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data[row['Student']]=row
data_file.close()

thanks for taking the time to read. any help would be greatly appreciated.

Upvotes: 3

Answers (4)

XrXr

Reputation: 2057

Here is a version using regular dictionary. Defaultdict is definitely better though.

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    if row['Student'] in data:
        data[row['Student']].append(row['amount'])
    else:
        data[row['Student']] = [row['amount']]
data_file.close()

EDIT:

For matching indicies
import csv
from collections import defaultdict

data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data=defaultdict(lambda:[0,0,0,0])
fruit_to_index = defaultdict(lambda:None,{'apple':0,'banana':1,'orange':2,'grape':3})
for row in reader:
    if fruit_to_index[row['food']] != None:
        data[row['Student']][fruit_to_index[row['food']]] = int(row['amount'])
data_file.close()

print data would be

defaultdict(<function <lambda> at address>, 
{'John':  [15, 20, 1, 3], 
'Ben':    [2 , 0 , 0, 0], 
'Andrew': [10, 0 , 0, 0]})

I think this is what you want.

EDIT2: Did this when the list of fruits didn't include strawberry and watermelon, but should be very easy to add. If the list is too large

to generate the fruit to index mapping

set_of_fruits = set()
for row in reader:
    set_of_fruits.add(row['food'])
c = 0
for e in set_of_fruits:
    fruit_to_index[e] = c
    c += 1

Note that the order of set_of_fruits is not generated.

data = defaultdict(lambda:[0,0,0,0]) becomes

data = defaultdict(lambda:[0 for x in range(len(set_of_fruits))])

Upvotes: 3

nakedfanatic

Reputation: 3178

Use the setdefault method of the dict.

import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
    data.setdefault(row['Student'], []).append(row['amount'])
data_file.close()

If the key, eg. "John", doesn't exist, it creates it with the supplied default value. In this case an empty list is the default.

Upvotes: 0

tzaman

Reputation: 47790

You probably actually want a nested dictionary structure; keeping a list and then trying to match indices to food names will get hairy fast.

import csv
from collections import defaultdict
data = defaultdict(dict)
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        data[row['Student']][row['food']] = row['amount']

This will give you a structure like so:

{'John': {'apple': 15, 'banana': 20, 'orange': 1}, 
 'Ben': {'apple': 2, 'watermelon': 4}, #etc.
}

That lets you look up particular foods without having to try to cross-reference another list to figure out where to find the counts, and supports any number of food items without having to fill your lists with zeros for all the missing ones.

If you want to be extra-fancy, you can use a nested defaultdict, so that looking up foods that didn't get entered will return zeros automatically, instead of giving KeyErrors; just change the second line to:

data = defaultdict(lambda: defaultdict(int))

Upvotes: 0

piokuc

Reputation: 26184

Try this, I think this what you want. Notice the usage of defaultdict, it could be done with a regular dictionary but defaultdict is very handy in such cases:

import csv
from collections import defaultdict
data=defaultdict(list)
with open('data.csv','rb') as data_file:
    reader=csv.DictReader(data_file)
    for row in reader:
        data[row['Student']].append(row['amount'])

Upvotes: 1

Python - Dictionary from CSV file with Multiple Values per Key

Answers (4)

Related Questions