Reputation: 243
I have two csv files: test1
A B
1 a
2 b
3 c
4 d
5 e
test2
A C D B
3 x 25 101
2 y 0.35 11
1 z 0.45 111
6 k 0.55 1101
7 l 0.65 1010
I want to merge them on = A but i need only the columns test1.A,B and test2.D, B in the final file. And since both these files have a same column name B, need to rename it while joining itself. The output file should be like this:
A B D B1
1 a 0.45 1110
2 b 0.35 1010
3 c 25 1011
4 d
5 e 0.55
6 0.65 1000
7 1111
where B1 corresponds to column B in Table2. Both B columns in test1 and test2 are not same.
import pandas
csv1 = pandas.read_csv('test1.csv',dtype='unicode')
csv2 = pandas.read_csv('test2.csv',dtype='unicode')
merged = pandas.merge(csv1[list('AB')],csv2[list('DB')], on='A',how="outer")
merged.to_csv("outputtest.csv", index=False)
This is giving me error:
KeyError: "['B'] not in index"
Upvotes: 1
Views: 287
Reputation: 863751
You can drop
column C
in csv2
and then merge
with parameter suffixes
and last fillna
by empty string
:
merged = pd.merge(csv1,
csv2.drop('C', axis=1),
on='A',
how="outer",
suffixes=('','1')).fillna('')
print (merged)
A B D B1
0 1.0 a 0.45 111
1 2.0 b 0.35 11
2 3.0 c 25 101
3 4.0 d
4 5.0 e
5 6.0 0.55 1101
6 7.0 0.65 1010
If in csv
is many columns, you can use subset - only columns which need and column for join - in this solution column A
:
merged = pd.merge(csv1[['A','B']],
csv2[['A','D','B']],
on='A',
how="outer",
suffixes=('','1')).fillna('')
print (merged)
A B D B1
0 1.0 a 0.45 111
1 2.0 b 0.35 11
2 3.0 c 25 101
3 4.0 d
4 5.0 e
5 6.0 0.55 1101
6 7.0 0.65 1010
Or:
merged = pd.merge(csv1[list('AB')],
csv2[list('ADB')],
on='A',
how="outer",
suffixes=('','1')).fillna('')
print (merged)
A B D B1
0 1.0 a 0.45 111
1 2.0 b 0.35 11
2 3.0 c 25 101
3 4.0 d
4 5.0 e
5 6.0 0.55 1101
6 7.0 0.65 1010
Upvotes: 1