Ravi P
Ravi P

Reputation: 167

How do I delete duplicate rows in pandas by a column; but not compare None to None values

I did a very inefficient loop to implement my solution but there got to be a better approach I can't think of.

My data:

|ticker|cusip|
|DIS   | 123 |
|DIS   | None|
|None  | abc |
|None  | xyz |

if I sort on cusip with ascending= False and drop_duplicates, it's working to remove the DIS row with cusip is None. But at the same time it's removing bottom rows having tickers None, which I want to keep.

I did find duplicate rows and then did a loop to get each duplicate group and applied drop duplicates which is very inefficient as I have to do a huge loop across 1000's of rows.

Is there an option to ignore None rows for dup check.

Upvotes: 0

Views: 468

Answers (2)

Carmoreno
Carmoreno

Reputation: 1319

You can use Series.where to convert 'None' values in NaN + DataFrame.dropna with subset param.

df['cusip'] = df['cusip'].where(~(df['cusip'] == 'None'))
df = df.dropna(subset=['cusip']).reset_index(drop=True)
print(df)

Output:

ticker cusip
0 DIS 123
1 None abc
2 None xyz

Upvotes: 0

BENY
BENY

Reputation: 323276

Try duplicated and with or condition for None

out =  df[~df.duplicated('ticker')|df.ticker.eq('None')]
Out[448]: 
  ticker cusip
0    DIS   123
2   None   abc
3   None   xyz

Upvotes: 3

Related Questions