Reputation: 119
I have a data frame with duplicate rows. Some columns have data other have zeros. I want to retain the rows with data and eliminate those with zeros, but I also want to remove duplicates when there is only zeros. Here is an example
df =
A B
x 0
x 1
y 0
y 0
z 1
I want it to be
A B
x 1
y 0
z 1
I am struggling to drop duplicates hierarchically; first keeping the nonzero rows where columns values are different from zero, and then removing duplicates of the rows with column values are zero.
thank you
Upvotes: 0
Views: 206
Reputation: 153460
Use sort_values
and parameter keep='last'
:
df.sort_values(['A','B']).drop_duplicates(subset='A', keep='last')
Output:
A B
1 x 1
3 y 0
4 z 1
Update for negative values:
df.assign(sortkey = df.B.ne(0))
.sort_values(['A','sortkey'])
.drop_duplicates(subset='A', keep='last')
Upvotes: 3
Reputation: 323226
Or maybe you can using groupby
df.groupby('A',as_index=False).B.last()
Out[421]:
A B
0 x 1
1 y 0
2 z 1
Upvotes: 1