Reputation: 161
I have 2 Excel Sheets both put into a Pandas Dataframe.
Ex:
1.csv
EmployeeID | DOB | Age
A | 12/11/1978 | 2
B | 12/09/1988 | 8
C | 12/02/1985 | 20
D | 11/11/1990 | 25
2.csv
MemberIdentifier| DateofBirth| YOW
A | 12/11/1978 | 2
B | 12/09/1988 | 8
C | 12/02/1985 | 20
result.csv
EmployeeID | MemberIdentifier | DOB |DateofBirth| Exp | YOW
A | A | 12/11/1978 | 12/11/1978| 2 | 2
B | B | 12/09/1988 | 12/09/1988| 8 | 8
C | C | 12/02/1985 | 12/02/1985| 20 | 20
D | NA | NA | NA | NA | NA
Now how do I get a result which compares both the sheets based on the data not on the basis of column names because column names can be anything.
Any hint hint or idea of how to start will be great!!
Upvotes: 3
Views: 1298
Reputation: 862681
You need parameters left_on
and right_on
in merge
with left join:
df = pd.merge(df1, df2, left_on='EmployeeID', right_on='MemberIdentifier', how='left')
print (df)
EmployeeID DOB Age MemberIdentifier DateofBirth YOW
0 A 12/11/1978 2 A 12/11/1978 2.0
1 B 12/09/1988 8 B 12/09/1988 8.0
2 C 12/02/1985 20 C 12/02/1985 20.0
3 D 11/11/1990 25 NaN NaN NaN
If need match columns in two DataFrames
use:
d = {}
for col2 in df2.columns:
for col1 in df1.columns:
cond = df2[col2].isin(df1[col1]).all()
if cond:
d[col2] = col1
print (d)
{'MemberIdentifier': 'EmployeeID', 'DateofBirth': 'DOB', 'YOW': 'Age'}
Last you can rename columns by dict
:
df2 = df2.rename(columns=d)
print (df2)
EmployeeID DOB Age
0 A 12/11/1978 2
1 B 12/09/1988 8
2 C 12/02/1985 20
Upvotes: 3