Reputation: 2607
I have a dataframe similar to this:
index a b c d
0 1 1 1 3
1 1 1 2 1
2 1 2 1 4
3 1 2 2 1
4 2 2 1 5
5 2 2 2 9
6 2 2 1 2
7 2 3 2 6
I want to add new rows where c
is 0, and d
is replaced with the maximum value of d
of existing rows where a
and b
are the same:
index a b c d
8 1 1 0 3
9 1 2 0 4
10 2 2 0 9
11 2 3 0 6
What command can I use? Thanks!
Upvotes: 1
Views: 278
Reputation: 25249
I come up with solution using groupby
and pd.concat
as follows:
pd.concat([df, df.groupby(['a', 'b'])['d'].max().reset_index().assign(c=0)], ignore_index=True)
Out[1668]:
a b c d
0 1 1 1 3
1 1 1 2 1
2 1 2 1 4
3 1 2 2 1
4 2 2 1 5
5 2 2 2 9
6 2 2 1 2
7 2 3 2 6
8 1 1 0 3
9 1 2 0 4
10 2 2 0 9
11 2 3 0 6
Upvotes: 2
Reputation: 323276
Seems you can using sort_values
chain with drop_duplicates
, then append
df.append(df.sort_values('d').drop_duplicates(['a','b'],keep='last').assign(c=0))
Out[77]:
a b c d
index
0 1 1 1 3
1 1 1 2 1
2 1 2 1 4
3 1 2 2 1
4 2 2 1 5
5 2 2 2 9
6 2 2 1 2
7 2 3 2 6
0 1 1 0 3
2 1 2 0 4
7 2 3 0 6
5 2 2 0 9
Upvotes: 3