Learner120
Learner120

Reputation: 40

Fill empty columns with values from another column of another row based on an identifier

I am trying to fill a dataframe, containing repeated elements, based on an identifier. My Dataframe is as follows:

   Code Value
0  SJHV   
1  SJIO    96B
2  SJHV    33C
3  CPO3    22A
4  CPO3    22A
5  SJHV    33C       #< -- Numbers stored as strings
6   TOY   
7   TOY             #< -- These aren't NaN, they are empty strings

I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:

   Code Value
0  SJHV    33C
1  SJIO    96B
2  CPO3    22A      
3   TOY         

My attempt was as follows:

df['Value'].replace('', np.nan, inplace=True)

df2 = df.dropna(subset=['Value']).drop_duplicates('Code')

As expected, this code also drops the 'TOY' Code. Any suggestions?

Upvotes: 0

Views: 757

Answers (1)

Chris
Chris

Reputation: 16147

The empty strings should go to the bottom if you sort them, then you can just drop duplicates.

import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
    df.sort_values(by=['Value'], ascending=False)
      .drop_duplicates(subset=['Code'], keep='first')
      .sort_index()
)
    

Output

   Code Value
1  SJIO   96B
2  SJHV   33C
3  CPO3   22A
6   TOY      

Upvotes: 1

Related Questions