Reputation: 23
df = pd.DataFrame({
'x': ['p1','p2','p3','p2','p3','p4',],
'y': ['p2','p3','p4','p3','p4','p5'],
'z': ['100','200','300','400','500','600']
})
Expected outcome:
| x | y | z | a |
| -------- | ---------|----------|----------|
| p1 | p2 | 100 | Nan |
| p2 | p3 | 200 | 100 |
| p3 | p4 | 300 | 200 |
| p2 | p3 | 400 | 300 |
| p3 | p4 | 500 | 400 |
| p4 | p5 | 600 | 500 |
I was trying with below command but did not get the desired result
df['a']=df.loc[df['x'].isin(df['y']),'z']
Upvotes: 1
Views: 256
Reputation: 862641
Here are duplicated in x
and y
, so solution is more complicated:
df['g1'] = df.groupby('x').cumcount()
df['g2'] = df.groupby('y').cumcount()
mapped = df.set_index(df[['y','g2']].apply(tuple, 1))['z']
df['new'] = df[['x','g1']].apply(tuple, 1).map(mapped)
print (df)
x y z g1 g2 new
0 p1 p2 100 0 0 NaN
1 p2 p3 200 0 0 100
2 p3 p4 300 0 0 200
3 p2 p3 400 1 1 NaN
4 p3 p4 500 1 1 400
5 p4 p5 600 0 0 300
Upvotes: 2