Julianno Sambatti
Julianno Sambatti

Reputation: 119

Remove duplicate rows upon multiple conditions

I have a data frame with duplicate rows. Some columns have data other have zeros. I want to retain the rows with data and eliminate those with zeros, but I also want to remove duplicates when there is only zeros. Here is an example

df =

A   B
x   0
x   1
y   0
y   0
z   1

I want it to be

A   B
x   1
y   0
z   1

I am struggling to drop duplicates hierarchically; first keeping the nonzero rows where columns values are different from zero, and then removing duplicates of the rows with column values are zero.

thank you

Upvotes: 0

Views: 206

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Use sort_values and parameter keep='last':

df.sort_values(['A','B']).drop_duplicates(subset='A', keep='last')

Output:

   A  B
1  x  1
3  y  0
4  z  1

Update for negative values:

df.assign(sortkey = df.B.ne(0))
  .sort_values(['A','sortkey'])
  .drop_duplicates(subset='A', keep='last')

Upvotes: 3

BENY
BENY

Reputation: 323226

Or maybe you can using groupby

df.groupby('A',as_index=False).B.last()
Out[421]: 
   A  B
0  x  1
1  y  0
2  z  1

Upvotes: 1

Related Questions