ARJ
ARJ

Reputation: 2080

Merge 2 data frames using values of column vector from dataframe with and column of list from another dataframe

Say I have two data frames,

df1.head()
AH1
A
B
C
D
F

The second data frame df2.head()

AH1 BC
A,B SAP
C,D XY
A,B,C   BMZ
A,F Zoom

What I need is to merge the columns AH1 from two data frames df1 and df2based on the same values. In the end, I am aiming to have something like this,

output
    AH1 BC
    A   SAP, BMZ,Zoom
    B   SAP,BMZ
    C   XY,BMZ
    D   XY
    F   Zoom

Any help or suggestions are appreciated. I know pandas.merge would go however not between column vector and column of the list.

Upvotes: 0

Views: 63

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can do an explode:

(df2.assign(AH1=df2.AH1.str.split(','))
    .explode('AH1')
    .groupby('AH1')
    ['BC'].agg(list)
)

Output:

AH1
A    [SAP, BMZ, Zoom]
B          [SAP, BMZ]
C           [XY, BMZ]
D                [XY]
F              [Zoom]
Name: BC, dtype: object

Or replacing agg(list) with agg(', '.join):

(df2.assign(AH1=df2.AH1.str.split(','))
    .explode('AH1')
    .groupby('AH1')
    ['BC'].agg(', '.join)
)

Output:

AH1
A    SAP, BMZ, Zoom
B          SAP, BMZ
C           XY, BMZ
D                XY
F              Zoom
Name: BC, dtype: object

Upvotes: 1

Related Questions