coco
coco

Reputation: 951

pandas left join on one key and one column value inside other column

If I have 2 panda df, I want to let A left join B on 2 conditions, 1) A.id = B.id 2) A.x in B.x_set. Can anyone help me on this? :

A:

id  x
1   a
2   b
3   c

B:

id  x_set   detail
1   a,b,c   x
1   d       y
2   a,c     z
2   d       m
2   b       n
3   a       i
3   b,c     j

The final table should be like this:

id  x   detail
1   a   x
2   b   n
3   c   j

Upvotes: 1

Views: 1592

Answers (1)

rrcal
rrcal

Reputation: 3752

If using pandas==0.25, you can:

  1. Transform the values to list
  2. Explode the list into new rows
  3. Merge back with A using pd.merge
B['x_set'] = B['x_set'].apply(lambda x: x.split(','))
B = B.explode('x_set')

A.merge(B, left_on=['id','x'], right_on=['id','x_set'])

Out[11]: 
   id  x x_set detail
0   1  a     a      x
1   2  b     b      n
2   3  c     c      j

If pandas<0.25:

  1. Transform the values to list
  2. Get flatten list of x
  3. Create a new dataframe with the new list
  4. Pass the id and detail using pd.Series.repeat
  5. Merge with A (we can use the same keys here)
B['x_set'] = B['x_set'].apply(lambda x: x.split(','))

len_set = B['x_set'].apply(len).values

values = B['x_set'].values.flatten().tolist()
flat_results = [item for sublist in values for item in sublist]

new_B = pd.DataFrame(flat_results, columns=['x'])

new_B['id'] = B['id'].repeat(len_set).values
new_B['detail'] = B['detail'].repeat(len_set).values

A.merge(new_B, on=['id','x'])
Out[32]: 
   id  x detail
0   1  a      x
1   2  b      n
2   3  c      j

Upvotes: 2

Related Questions