nivo1000
nivo1000

Reputation: 199

how to write a python script that merges columns in two csv files based on a specific key

I want to write a python script which merges scores for two csv files based on a specific key.

file1.csv

    id, uid, score1, score2
    1,abc,3,5
    2,def,2,4

file2.csv

    id, uid, score3
    1,def,5
    2,abc,4

example of desired joined file for given key 'uid':

    uid, score1, score2, score3
    abc, 3, 5, 4
    def, 2, 4, 5

My code looks like it should work but for some reason I keep getting

    KeyError: 'uid'

when I try and run this:

    import pandas as pd

    csv1 = pd.read_csv('file1.csv')
    csv2 = pd.read_csv('file2.csv')
    csv1.drop(csv1.columns[[0]], axis=1, inplace=True)
    csv2.drop(csv2.columns[[0]], axis=1, inplace=True)

    merged = pd.merge(csv1, csv2, on='uid')
    print merged

I even tried replacing

    merged = pd.merge(csv1, csv2, on='uid')

with

    merged = csv1.merge(csv2, on='uid')

and I got the same error.

I think the drop might be modifying the indexes somehow so that merge can't read 'uid' but I don't know how to fix it.

Upvotes: 1

Views: 106

Answers (2)

YCFlame
YCFlame

Reputation: 1289

Because pandas' read_csv method will not strip spaces in the first line for you. You can see all the keys in your csv dataframe by printing csv1.keys(), which will like this:

Index([u'id', u' uid', u' score1', u' score2'], dtype='object')

So you have to use ' uid' as merge key or change the first lines in your file1.csv or file2.csv.

P.S. You may look at this question to save a little strip work by hands

Upvotes: 1

Jay
Jay

Reputation: 23

Instead of dropping the 'id' column in both, can you try

merged = pd.merge(csv1,csv2, on=['id','uid'])

Upvotes: 0

Related Questions