Diganta Bharali
Diganta Bharali

Reputation: 243

Merge two csv files with custom columns

I have two csv files: test1

A   B 
1   a
2   b
3   c
4   d
5   e

test2

A   C   D   B
3   x   25  101
2   y   0.35    11
1   z   0.45    111
6   k   0.55    1101
7   l   0.65    1010

I want to merge them on = A but i need only the columns test1.A,B and test2.D, B in the final file. And since both these files have a same column name B, need to rename it while joining itself. The output file should be like this:

A   B   D       B1
1   a   0.45    1110
2   b   0.35    1010
3   c   25      1011
4   d       
5   e   0.55    
6       0.65    1000
7               1111

where B1 corresponds to column B in Table2. Both B columns in test1 and test2 are not same.

import pandas

csv1 = pandas.read_csv('test1.csv',dtype='unicode')
csv2 = pandas.read_csv('test2.csv',dtype='unicode')
merged = pandas.merge(csv1[list('AB')],csv2[list('DB')], on='A',how="outer")
merged.to_csv("outputtest.csv", index=False)

This is giving me error:

KeyError: "['B'] not in index"

Upvotes: 1

Views: 287

Answers (1)

jezrael
jezrael

Reputation: 863751

You can drop column C in csv2 and then merge with parameter suffixes and last fillna by empty string:

merged = pd.merge(csv1,
                  csv2.drop('C', axis=1), 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

If in csv is many columns, you can use subset - only columns which need and column for join - in this solution column A:

merged = pd.merge(csv1[['A','B']],
                  csv2[['A','D','B']], 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

Or:

merged = pd.merge(csv1[list('AB')],
                  csv2[list('ADB')], 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

Upvotes: 1

Related Questions