Eoin Vaughan
Eoin Vaughan

Reputation: 173

Fill subsequent values beneath an existing value in pandas dataframe column

I have a Pandas dataframe df

I want to populate subsequent values in a column based on the value that preceded it and when I come across another value do the same for that.

So the dept column is complete and I can merge this dataset with another to have departments linked info for PIs.

Don't know the best approach, is there a vectorized approach to this our would it require looping, maybe using iterrows() or itertuples().

data = {"dept": ["Emergency Medicine", "", "", "", "Family Practice", "", ""],
        "pi": [NaN, "Tiger Woods", "Michael Jordan", "Roger Federer", NaN, "Serena Williams", "Alex Morgan"]
        }

df = pd.DataFrame(data=data)

        dept                  pi
0       Emergency Medicine  
1                             Tiger Woods
2                             Michael Jordan
3                             Roger Federer
4       Family Practice 
5                             Serena Williams
6                             Alex Morgan
desired_df

        dept                  pi
0       Emergency Medicine  
1       Emergency Medicine    Tiger Woods
2       Emergency Medicine    Michael Jordan
3       Emergency Medicine    Roger Federer
4       Family Practice 
5       Family Practice       Serena Williams
6       Family Practice       Alex Morgan

Upvotes: 0

Views: 42

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Use where to mask those empty rows with nan, then ffill

# if you have empty strings
mask = df['dept'].ne('')
df['dept'] = df['dept'].where(mask).ffill()

# otherwise, just
# df['dept'] = df['dept'].ffill()

Output:

                 dept               pi
0  Emergency Medicine              NaN
1  Emergency Medicine      Tiger Woods
2  Emergency Medicine   Michael Jordan
3  Emergency Medicine    Roger Federer
4     Family Practice              NaN
5     Family Practice  Serena Williams
6     Family Practice      Alex Morgan

Upvotes: 4

Related Questions