Number Logic
Number Logic

Reputation: 894

handle multiple pandas merge with nulls

I have three data frames:

Frame1:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)
ip

Frame2:

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)
cusipmap

Frame3:

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)
isinmap

I want to add a column to frame 1 , keeping the number of rows in Frame 1 constant ( left join) by first joining on CUSIP on Frame 2 and Then with ISIN on Frame 3. HOWEVER, I want to do the ISIN join on frame 3 only if there is a NaN produced as part of the Frame 2 join. SO, the output that I am looking for is something like:

op= {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN],'ID':['a1','b1','c1']}
op = pd.DataFrame(data=op)
op

Does the pd.merge have any functionality to perform the operations above ?

Upvotes: 0

Views: 31

Answers (2)

ivallesp
ivallesp

Reputation: 2202

The way I would do it would be to merge both dataframes and then build the desired column using .loc. Example below:

import pandas as pd
import numpy as np
d = {'instrument': ['a','b','c'], 'CUSIP': ['US1',np.NaN,'US2'],'ISIN':[np.NaN,"EU1",np.NaN]}
ip = pd.DataFrame(data=d)

cusipid={'CUSIP':['a','c'],'ID':["a1","c1"]}
cusipmap=pd.DataFrame(data=cusipid)

isinid={'ISIN':['b','c'],'ID':["b1","c2"]}
isinmap=pd.DataFrame(data=isinid)


df = (ip
      .merge(cusipmap.rename(columns={"CUSIP": "instrument"}), how="left", on="instrument")
      .merge(isinmap.rename(columns={"ISIN": "instrument"}), how="left", on="instrument", suffixes=["CUSIP", "ISIN"]))

df["ID"] = df.IDCUSIP
df.loc[df.ID.isnull(), "ID"] = df.loc[df.ID.isnull(), "IDISIN"]
df = df.drop(["IDCUSIP", "IDISIN"], axis=1)

Upvotes: 0

jezrael
jezrael

Reputation: 862406

You can create new Series with Series.map and then replace missing values by another Series with Series.fillna or Series.combine_first:

s1 = ip['instrument'].map(cusipmap.set_index('CUSIP')['ID'])
s2 = ip['instrument'].map(isinmap.set_index('ISIN')['ID'])
ip['Id'] = s1.fillna(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1

Solution with merge:

s1 = ip.merge(cusipmap.rename(columns={'CUSIP':'instrument'}),on='instrument',how='left')['ID']
s2 = ip.merge(isinmap.rename(columns={'ISIN':'instrument'}),on='instrument',how='left')['ID']

ip['Id'] = s1.combine_first(s2)
print (ip)
  instrument CUSIP ISIN  Id
0          a   US1  NaN  a1
1          b   NaN  EU1  b1
2          c   US2  NaN  c1

Upvotes: 1

Related Questions