Reputation: 199
I want to write a python script which merges scores for two csv files based on a specific key.
file1.csv
id, uid, score1, score2
1,abc,3,5
2,def,2,4
file2.csv
id, uid, score3
1,def,5
2,abc,4
example of desired joined file for given key 'uid':
uid, score1, score2, score3
abc, 3, 5, 4
def, 2, 4, 5
My code looks like it should work but for some reason I keep getting
KeyError: 'uid'
when I try and run this:
import pandas as pd
csv1 = pd.read_csv('file1.csv')
csv2 = pd.read_csv('file2.csv')
csv1.drop(csv1.columns[[0]], axis=1, inplace=True)
csv2.drop(csv2.columns[[0]], axis=1, inplace=True)
merged = pd.merge(csv1, csv2, on='uid')
print merged
I even tried replacing
merged = pd.merge(csv1, csv2, on='uid')
with
merged = csv1.merge(csv2, on='uid')
and I got the same error.
I think the drop might be modifying the indexes somehow so that merge can't read 'uid' but I don't know how to fix it.
Upvotes: 1
Views: 106
Reputation: 1289
Because pandas' read_csv method will not strip spaces in the first line for you. You can see all the keys in your csv dataframe by printing csv1.keys(), which will like this:
Index([u'id', u' uid', u' score1', u' score2'], dtype='object')
So you have to use ' uid' as merge key or change the first lines in your file1.csv or file2.csv.
P.S. You may look at this question to save a little strip work by hands
Upvotes: 1
Reputation: 23
Instead of dropping the 'id' column in both, can you try
merged = pd.merge(csv1,csv2, on=['id','uid'])
Upvotes: 0