How do I delete duplicate rows in pandas by a column; but not compare None to None values

Question

I did a very inefficient loop to implement my solution but there got to be a better approach I can't think of.

My data:

|ticker|cusip|
|DIS   | 123 |
|DIS   | None|
|None  | abc |
|None  | xyz |

if I sort on cusip with ascending= False and drop_duplicates, it's working to remove the DIS row with cusip is None. But at the same time it's removing bottom rows having tickers None, which I want to keep.

I did find duplicate rows and then did a loop to get each duplicate group and applied drop duplicates which is very inefficient as I have to do a huge loop across 1000's of rows.

Is there an option to ignore None rows for dup check.

Carmoreno · Accepted Answer

You can use Series.where to convert 'None' values in NaN + DataFrame.dropna with subset param.

df['cusip'] = df['cusip'].where(~(df['cusip'] == 'None'))
df = df.dropna(subset=['cusip']).reset_index(drop=True)
print(df)

Output:

	ticker	cusip
0	DIS	123
1	None	abc
2	None	xyz

How do I delete duplicate rows in pandas by a column; but not compare None to None values

Answers (2)

Related Questions