user10689363
user10689363

Reputation: 23

Select value from third column after matching condition of two columns

df = pd.DataFrame({
    'x': ['p1','p2','p3','p2','p3','p4',],
    'y': ['p2','p3','p4','p3','p4','p5'],
    'z': ['100','200','300','400','500','600']
})
Expected outcome: 
|     x    |     y    |     z    |     a    |
| -------- | ---------|----------|----------|
|     p1   |     p2   |     100  |   Nan    |
|     p2   |     p3   |     200  |   100    |
|     p3   |     p4   |     300  |   200    |
|     p2   |     p3   |     400  |   300    |
|     p3   |     p4   |     500  |   400    |
|     p4   |     p5   |     600  |   500    |

I was trying with below command but did not get the desired result

df['a']=df.loc[df['x'].isin(df['y']),'z']

Upvotes: 1

Views: 256

Answers (1)

jezrael
jezrael

Reputation: 862641

Here are duplicated in x and y, so solution is more complicated:

df['g1'] = df.groupby('x').cumcount()
df['g2'] = df.groupby('y').cumcount()

mapped = df.set_index(df[['y','g2']].apply(tuple, 1))['z']
df['new'] = df[['x','g1']].apply(tuple, 1).map(mapped)
print (df)
    x   y    z  g1  g2  new
0  p1  p2  100   0   0  NaN
1  p2  p3  200   0   0  100
2  p3  p4  300   0   0  200
3  p2  p3  400   1   1  NaN
4  p3  p4  500   1   1  400
5  p4  p5  600   0   0  300

Upvotes: 2

Related Questions