Reputation: 3
I wish to merge data frames as fetched via sql under multiple condition.
The df1 and df2 are shown below:
df1
Customer ID Cluster ID Customer Zone ID
CUS1001.A CUS1001.X CUS1000
CUS1001.B CUS1001.X CUS1000
CUS1001.C CUS1001.X CUS1000
CUS1001.D CUS1001.X CUS1000
CUS1001.E CUS1001.X CUS1000
CUS2001.A CUS2001.X CUS2000
df2:
Complain ID RegistrationNumber Status
CUS3501.A 99231 open
CUS1001.B 21340 open
CUS1001.X 32100 open
I wish to merge these two data frame with following condition:
if(Complain ID == Customer ID):
Merge on Customer ID
Elif(Complain ID == Cluster ID):
Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
Merge on Customer ID
Else:
Merge empty row.
Final result should look like this:
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
Please help!
Upvotes: 0
Views: 2837
Reputation: 323346
Try this ...using pandas
: melt
, merge
and concat
df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
Target
Out[39]:
Customer ID Cluster ID Customer Zone ID Complain ID RegistrationNumber \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100.0
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340.0
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100.0
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X 32100.0
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X 32100.0
5 CUS2001.A CUS2001.X CUS2000 0 0.0
Status
0 open
1 open
2 open
3 open
4 open
5 0
Update
By using numpy's intersect1d
, Personally I like this approach most than the previous one .
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN
Upvotes: 1