Reputation:
I have three CSV files:
file 1
id,code
1,a
2,b
3,c
4,d
file 2
no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y
file 3
record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556
I would like to merge count
and sum
from file 2 with file 1 if id
and no
match and merge mean
and median
from file 3 with file 1 if id
and record
match. I tried the below code but the final output file has lots of full fields even though they match with id
.
df = pd.concat([file1, file2,file3], join_axes=[df.index])
df= df.drop["class"]
Upvotes: 0
Views: 59
Reputation: 863301
I believe you need set indexes from first column in read_csv
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""id,code
1,a
2,b
3,c
4,d"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file1 = pd.read_csv(StringIO(temp), index_col=[0])
print (file1)
temp=u"""
no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file2 = pd.read_csv(StringIO(temp), index_col=[0])
print (file2)
temp=u"""
record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file3 = pd.read_csv(StringIO(temp), index_col=[0])
print (file3)
df = pd.concat([file1, file2,file3], axis=1).drop("class", axis=1)
print (df)
code count sum mean median
1 a 674.0 78256.0 67234.0 785.0
2 b NaN NaN 67233.0 664.0
3 c 567.0 55562.0 5437.0 553.0
4 d 344.0 56789.0 34423.0 556.0
5 NaN 673.0 66259.0 NaN NaN
Also if indexes are not set in read_csv
is necessary add set_index
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""id,code
1,a
2,b
3,c
4,d"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file1 = pd.read_csv(StringIO(temp))
print (file1)
temp=u"""
no,count,sum,class
3,567,55562,Y
5,673,66259,L
1,674,78256,Y
4,344,56789,Y"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file2 = pd.read_csv(StringIO(temp))
print (file2)
temp=u"""
record,mean,median
3,5437,553
2,67233,664
1,67234,785
4,34423,556"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
file3 = pd.read_csv(StringIO(temp))
print (file3)
df=pd.concat([file1.set_index('id'), file2.set_index('no'),file3.set_index('record')],axis=1)
.drop("class", axis=1)
print (df)
code count sum mean median
1 a 674.0 78256.0 67234.0 785.0
2 b NaN NaN 67233.0 664.0
3 c 567.0 55562.0 5437.0 553.0
4 d 344.0 56789.0 34423.0 556.0
5 NaN 673.0 66259.0 NaN NaN
Or for inner join add join='inner'
to concat
:
df = pd.concat([file1.set_index('id'),
file2.set_index('no'),
file3.set_index('record')], join='inner', axis=1).drop("class", axis=1)
print (df)
code count sum mean median
3 c 567 55562 5437 553
1 a 674 78256 67234 785
4 d 344 56789 34423 556
Upvotes: 1