Reputation: 165
I have a two dataframes.
df = pd.DataFrame([[1,2,3,[4,5]],[6,7,8,[9,10]]], columns=['a','b','c','d'])
df2 = pd.DataFrame([[4,'abc'],[5,'ef'], [10,'g'], [12,'hijk']], columns=['a_2','b_2'])
In [151]: df
Out[151]:
a b c d
0 1 2 3 [4, 5]
1 6 7 8 [9, 10]
In [152]: df2
Out[152]:
a_2 b_2
0 4 abc
1 5 ef
2 10 g
3 12 hijk
I want to merge the two based on column 'd' of df and get the following output-
df3 = pd.DataFrame([[1,2,3,[4,5],['abc','ef']],[6,7,8,[9,10],['g']]], columns=['a','b','c','d','b_2'])
In [153]: df3
Out[153]:
a b c d b_2
0 1 2 3 [4, 5] [abc, ef]
1 6 7 8 [9, 10] [g]
I did try with 'merge' but I am not getting the required results.
Upvotes: 2
Views: 123
Reputation: 8768
Try this:
(pd.merge(
df.explode('d').reset_index(),
df2,left_on='d',
right_on='a_2',
how = 'left'
)
.groupby(['index',*list('abc')])[['d','b_2']].agg(lambda x: x.dropna().tolist())
.reset_index()
.drop('index',axis=1))
Output:
a b c d b_2
0 1 2 3 [4, 5] [abc, ef]
1 6 7 8 [9, 10] [g]
Upvotes: 0
Reputation: 765
col1=df1.apply(lambda ss:df2.query("a_2 in @ss.d").b_2.tolist(),1)
df1.assign(b_2=col1)
:
┌───────┬───────┬───────┬───────┬─────────┬───────────┐
│ index │ a │ b │ c │ d │ b_2 │
│ int64 │ int64 │ int64 │ int64 │ int32[] │ varchar[] │
├───────┼───────┼───────┼───────┼─────────┼───────────┤
│ 0 │ 1 │ 2 │ 3 │ [4, 5] │ [abc, ef] │
│ 1 │ 6 │ 7 │ 8 │ [9, 10] │ [g] │
└───────┴───────┴───────┴───────┴─────────┴───────────┘
Upvotes: 0
Reputation: 402493
This isn't exactly a merge problem, but I'd do this using a list comprehension that calls Series.get
:
s = df2.set_index('a_2')['b_2'] # mapping to use
[[s.get(y) for y in x if y in s] for x in df['d']]
# [['abc', 'ef'], ['g']]
# df['b_2'] = [[s.get(y) for y in x if y in s] for x in df['d']]
df3 = df.assign(b_2=[[s.get(y) for y in x if y in s] for x in df['d']])
df3
a b c d b_2
0 1 2 3 [4, 5] [abc, ef]
1 6 7 8 [9, 10] [g]
Evidence suggests that list comprehensions should be reasonably fast given the non-vectorizable nature of your problem.
Upvotes: 2