Reputation: 894
I have three data frames:
Frame1:
import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)
ip
Frame2:
cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)
cusipmap
Frame3:
isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)
isinmap
I want to add a column to frame 1 , keeping the number of rows in Frame 1 constant ( left join) by first joining on CUSIP on Frame 2 and Then with ISIN on Frame 3. HOWEVER, I want to do the ISIN join on frame 3 only if there is a NaN produced as part of the Frame 2 join. SO, the output that I am looking for is something like:
op= {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN],'ID':['a1','b1','c1']}
op = pd.DataFrame(data=op)
op
Does the pd.merge have any functionality to perform the operations above ?
Upvotes: 0
Views: 31
Reputation: 2202
The way I would do it would be to merge both dataframes and then build the desired column using .loc
. Example below:
import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)
cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)
isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)
df = (ip
.merge(cusipmap.rename(columns={"CUSIP": "instrument"}), how="left", on="instrument")
.merge(isinmap.rename(columns={"ISIN": "instrument"}), how="left", on="instrument", suffixes=["CUSIP", "ISIN"]))
df["ID"] = df.IDCUSIP
df.loc[df.ID.isnull(), "ID"] = df.loc[df.ID.isnull(), "IDISIN"]
df = df.drop(["IDCUSIP", "IDISIN"], axis=1)
Upvotes: 0
Reputation: 862406
You can create new Series
with Series.map
and then replace missing values by another Series with Series.fillna
or Series.combine_first
:
s1 = ip['instrument'].map(cusipmap.set_index('CUSIP')['ID'])
s2 = ip['instrument'].map(isinmap.set_index('ISIN')['ID'])
ip['Id'] = s1.fillna(s2)
print (ip)
instrument CUSIP ISIN Id
0 a US1 NaN a1
1 b NaN EU1 b1
2 c US2 NaN c1
Solution with merge
:
s1 = ip.merge(cusipmap.rename(columns={'CUSIP':'instrument'}),on='instrument',how='left')['ID']
s2 = ip.merge(isinmap.rename(columns={'ISIN':'instrument'}),on='instrument',how='left')['ID']
ip['Id'] = s1.combine_first(s2)
print (ip)
instrument CUSIP ISIN Id
0 a US1 NaN a1
1 b NaN EU1 b1
2 c US2 NaN c1
Upvotes: 1