Reputation: 71
I am trying to make a dictionary from a csv file in python. Let's say the CSV contains:
Student food amount
John apple 15
John banana 20
John orange 1
John grape 3
Ben apple 2
Ben orange 4
Ben strawberry 8
Andrew apple 10
Andrew watermelon 3
what i'm envisioning is a dictionary whose key will be the student name and a list as the value where each entry corresponds to a different food. I would have to count the number of unique food items in the second column and that would be the length of the vector. For example:
The value of [15,20,1,3,0,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'John'.
The value of [2,0,4,0,8,0] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Ben'.
The value of [10,0,0,0,0,3] would correspond to [apple, banana, orange, grape, strawberry, watermelon] for 'Andrew'
The expected output of the dict would look like this:
dict={'John':{[15,20,1,3,0,0]}, 'Ben': {[2,0,4,0,8,0]}, 'Andrew': {[10,0,0,0,0,3]}}
I'm having trouble creating the dictionary to begin with or if a dictionary is even the right approach. What I have to begin with:
import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
data[row['Student']]=row
data_file.close()
thanks for taking the time to read. any help would be greatly appreciated.
Upvotes: 3
Views: 7465
Reputation: 2057
Here is a version using regular dictionary. Defaultdict is definitely better though.
import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
if row['Student'] in data:
data[row['Student']].append(row['amount'])
else:
data[row['Student']] = [row['amount']]
data_file.close()
EDIT:
For matching indicies
import csv
from collections import defaultdict
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data=defaultdict(lambda:[0,0,0,0])
fruit_to_index = defaultdict(lambda:None,{'apple':0,'banana':1,'orange':2,'grape':3})
for row in reader:
if fruit_to_index[row['food']] != None:
data[row['Student']][fruit_to_index[row['food']]] = int(row['amount'])
data_file.close()
print data
would be
defaultdict(<function <lambda> at address>,
{'John': [15, 20, 1, 3],
'Ben': [2 , 0 , 0, 0],
'Andrew': [10, 0 , 0, 0]})
I think this is what you want.
EDIT2: Did this when the list of fruits didn't include strawberry and watermelon, but should be very easy to add. If the list is too large
to generate the fruit to index mapping
set_of_fruits = set()
for row in reader:
set_of_fruits.add(row['food'])
c = 0
for e in set_of_fruits:
fruit_to_index[e] = c
c += 1
Note that the order of set_of_fruits is not generated.
data = defaultdict(lambda:[0,0,0,0])
becomes
data = defaultdict(lambda:[0 for x in range(len(set_of_fruits))])
Upvotes: 3
Reputation: 3178
Use the setdefault method of the dict.
import csv
data_file=open('data.csv','rU')
reader=csv.DictReader(data_file)
data={}
for row in reader:
data.setdefault(row['Student'], []).append(row['amount'])
data_file.close()
If the key, eg. "John", doesn't exist, it creates it with the supplied default value. In this case an empty list is the default.
Upvotes: 0
Reputation: 47790
You probably actually want a nested dictionary structure; keeping a list and then trying to match indices to food names will get hairy fast.
import csv
from collections import defaultdict
data = defaultdict(dict)
with open('data.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
data[row['Student']][row['food']] = row['amount']
This will give you a structure like so:
{'John': {'apple': 15, 'banana': 20, 'orange': 1},
'Ben': {'apple': 2, 'watermelon': 4}, #etc.
}
That lets you look up particular foods without having to try to cross-reference another list to figure out where to find the counts, and supports any number of food items without having to fill your lists with zeros for all the missing ones.
If you want to be extra-fancy, you can use a nested defaultdict
, so that looking up foods that didn't get entered will return zeros automatically, instead of giving KeyError
s; just change the second line to:
data = defaultdict(lambda: defaultdict(int))
Upvotes: 0
Reputation: 26184
Try this, I think this what you want. Notice the usage of defaultdict, it could be done with a regular dictionary but defaultdict is very handy in such cases:
import csv
from collections import defaultdict
data=defaultdict(list)
with open('data.csv','rb') as data_file:
reader=csv.DictReader(data_file)
for row in reader:
data[row['Student']].append(row['amount'])
Upvotes: 1