Gnubie
Gnubie

Reputation: 2607

How to add aggregated rows based on other rows in Pandas dataframe

I have a dataframe similar to this:

index a b c d
    0 1 1 1 3
    1 1 1 2 1
    2 1 2 1 4
    3 1 2 2 1
    4 2 2 1 5
    5 2 2 2 9
    6 2 2 1 2
    7 2 3 2 6

I want to add new rows where c is 0, and d is replaced with the maximum value of d of existing rows where a and b are the same:

index a b c d
    8 1 1 0 3
    9 1 2 0 4
   10 2 2 0 9
   11 2 3 0 6

What command can I use? Thanks!

Upvotes: 1

Views: 278

Answers (2)

Andy L.
Andy L.

Reputation: 25249

I come up with solution using groupby and pd.concat as follows:

pd.concat([df, df.groupby(['a', 'b'])['d'].max().reset_index().assign(c=0)], ignore_index=True)


Out[1668]:
    a  b  c  d
0   1  1  1  3
1   1  1  2  1
2   1  2  1  4
3   1  2  2  1
4   2  2  1  5
5   2  2  2  9
6   2  2  1  2
7   2  3  2  6
8   1  1  0  3
9   1  2  0  4
10  2  2  0  9
11  2  3  0  6    

Upvotes: 2

BENY
BENY

Reputation: 323276

Seems you can using sort_values chain with drop_duplicates, then append

df.append(df.sort_values('d').drop_duplicates(['a','b'],keep='last').assign(c=0))
Out[77]: 
       a  b  c  d
index            
0      1  1  1  3
1      1  1  2  1
2      1  2  1  4
3      1  2  2  1
4      2  2  1  5
5      2  2  2  9
6      2  2  1  2
7      2  3  2  6
0      1  1  0  3
2      1  2  0  4
7      2  3  0  6
5      2  2  0  9

Upvotes: 3

Related Questions