Reputation: 33
I've written a program that takes messy data from our machine booking system, cleans it up and spits out the user, the hours they've used and the machine name. I've been handling everything as CSV files in Pandas.
The output file contains three columns - Resource name, User name and Hours.
I've also built a Dictionary file (also csv) containing User name (as the key) and Stores Code.
I want to take my output file, and using the dictionary, add a fourth column containing the Stores code. Here's the relevant portion of the code
import csv
import pandas as pd
#open output file, add headers
process = pd.read_csv('C:\Users\someone\Desktop\pythonwork\data\processed2.csv',header=None)
process.columns = [ 'Resource', 'Name', 'Hours']
#read code list
with open('C:\Users\someone\Desktop\pythonwork\data\codes.csv') as f:
codes = dict(filter(None, csv.reader(f)))
for i in process.index:
nam=str(process['Name'])
grantcode=codes.get(nam, 0)
print grantcode
It runs with no errors but the trouble is that it returns just zeros for all the codes. If I add a line that queries the dictionary with an actual name it pulls out the correct value. Is there any way to query a dictionary using a variable?
Upvotes: 0
Views: 76
Reputation: 3881
Your main problem is in the following line:
nam=str(process['Name'])
This actually returns the whole Series
or column. So it does not exist as a key. My suggestion is to build the column and then insert it into the data frame.
grant_codes = [codes.get(name, 0) for name in process['Name']]
process['Code'] = grant_codes
If you wanted to keep iterating over the index, you'd have to do something like the following but I recommend the above:
for i in process.index:
nam = process.at[i, 'Name']
grant_code = codes.get(nam, 0)
Upvotes: 1