Reputation: 167
I did a very inefficient loop to implement my solution but there got to be a better approach I can't think of.
My data:
|ticker|cusip|
|DIS | 123 |
|DIS | None|
|None | abc |
|None | xyz |
if I sort on cusip with ascending= False and drop_duplicates, it's working to remove the DIS row with cusip is None. But at the same time it's removing bottom rows having tickers None, which I want to keep.
I did find duplicate rows and then did a loop to get each duplicate group and applied drop duplicates which is very inefficient as I have to do a huge loop across 1000's of rows.
Is there an option to ignore None rows for dup check.
Upvotes: 0
Views: 468
Reputation: 1319
You can use Series.where
to convert 'None'
values in NaN
+ DataFrame.dropna
with subset
param.
df['cusip'] = df['cusip'].where(~(df['cusip'] == 'None'))
df = df.dropna(subset=['cusip']).reset_index(drop=True)
print(df)
Output:
ticker | cusip | |
---|---|---|
0 | DIS | 123 |
1 | None | abc |
2 | None | xyz |
Upvotes: 0
Reputation: 323276
Try duplicated
and with or condition for None
out = df[~df.duplicated('ticker')|df.ticker.eq('None')]
Out[448]:
ticker cusip
0 DIS 123
2 None abc
3 None xyz
Upvotes: 3