Reputation: 40
I am trying to fill a dataframe, containing repeated elements, based on an identifier. My Dataframe is as follows:
Code Value
0 SJHV
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
4 CPO3 22A
5 SJHV 33C #< -- Numbers stored as strings
6 TOY
7 TOY #< -- These aren't NaN, they are empty strings
I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:
Code Value
0 SJHV 33C
1 SJIO 96B
2 CPO3 22A
3 TOY
My attempt was as follows:
df['Value'].replace('', np.nan, inplace=True)
df2 = df.dropna(subset=['Value']).drop_duplicates('Code')
As expected, this code also drops the 'TOY' Code. Any suggestions?
Upvotes: 0
Views: 757
Reputation: 16147
The empty strings should go to the bottom if you sort them, then you can just drop duplicates.
import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
df.sort_values(by=['Value'], ascending=False)
.drop_duplicates(subset=['Code'], keep='first')
.sort_index()
)
Output
Code Value
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
6 TOY
Upvotes: 1