Reputation: 173
I have a Pandas dataframe df
I want to populate subsequent values in a column based on the value that preceded it and when I come across another value do the same for that.
So the dept
column is complete and I can merge this dataset with another to have departments linked info for PIs.
Don't know the best approach, is there a vectorized approach to this our would it require looping, maybe using iterrows()
or itertuples()
.
data = {"dept": ["Emergency Medicine", "", "", "", "Family Practice", "", ""],
"pi": [NaN, "Tiger Woods", "Michael Jordan", "Roger Federer", NaN, "Serena Williams", "Alex Morgan"]
}
df = pd.DataFrame(data=data)
dept pi
0 Emergency Medicine
1 Tiger Woods
2 Michael Jordan
3 Roger Federer
4 Family Practice
5 Serena Williams
6 Alex Morgan
desired_df
dept pi
0 Emergency Medicine
1 Emergency Medicine Tiger Woods
2 Emergency Medicine Michael Jordan
3 Emergency Medicine Roger Federer
4 Family Practice
5 Family Practice Serena Williams
6 Family Practice Alex Morgan
Upvotes: 0
Views: 42
Reputation: 150735
Use where
to mask those empty rows with nan
, then ffill
# if you have empty strings
mask = df['dept'].ne('')
df['dept'] = df['dept'].where(mask).ffill()
# otherwise, just
# df['dept'] = df['dept'].ffill()
Output:
dept pi
0 Emergency Medicine NaN
1 Emergency Medicine Tiger Woods
2 Emergency Medicine Michael Jordan
3 Emergency Medicine Roger Federer
4 Family Practice NaN
5 Family Practice Serena Williams
6 Family Practice Alex Morgan
Upvotes: 4