Reputation: 3640

pandas merge rows based on grouping

Let's say I have a dataframe that looks like this:

col1    col2    col3
a       1       a
a       98      xx
a       99      xy
b       1       a
b       2       b
b       3       c
b       8       xx
b       9       xy

I need to merge rows where in col3 = xx and xy, which are grouped by col1, hence the resulting dataframe looks like:

col1    col2    col3
a       1       a
a       98      xz
b       1       a
b       2       b
b       3       c
b       8       xz

Is there a simple way of doing this pandas?

Upvotes: 0

Answers (2)

Quang Hoang

Reputation: 150805

Here's my approach with drop_duplicates:

# xx and xy
s = df.col3.isin(['xx','xy']);

(df.assign(col3=lambda x: np.where(s, 'xz', x['col3']), # replace xx and xy with xz
           mask=s,                                      # where xx and xy
           block=(~s).cumsum())                         # block of xx and xy
   .drop_duplicates(['col1','mask','block'])
   .drop(['mask','block'], axis=1)
)

Output:

  col1  col2 col3
0    a     1    a
1    a    98   xz
3    b     1    a
4    b     2    b
5    b     3    c
6    b     8   xz

Upvotes: 0

BENY

Reputation: 323376

IIUC

df.groupby([df.col1,df.col3.replace({'xx':'xz','xy':'xz'})]).col2.first().reset_index()
Out[29]: 
  col1 col3  col2
0    a    a     1
1    a   xz    98
2    b    a     1
3    b    b     2
4    b    c     3
5    b   xz     8

Upvotes: 2

pandas merge rows based on grouping

Answers (2)

Related Questions