guest_42
guest_42

Reputation: 33

Pandas fill NaN base on last available value and the next available value

I have csv data looks like this:

     A     B
0   x aa   
1   z aa   
2          
3   
4   x aa   
5   z bb
6   x bb
7          
8   z cc   

I would like to fill the empty cells in B column with values in A, if in A column

last_available_value_before_the_NaNs_in_A.split()[-1] == next_available_value_after_the_NaNs_in_A.split()[-1]

the wanted result would be:

     A     B
0   x aa   aa 
1   z aa   aa
2          aa
3          aa
4   x aa   aa
5   z bb   bb
6   x bb   bb
7          
8   z cc   cc

data.loc(7,'B') will be NaN because data.loc(6,'A').split()[-1] == data.loc(8,'A').split()[-1] is false.

data.loc(5,'B') is 'bb' because data.loc(5,'A').split()[-1] == 'bb'

Thanks for your help!

Upvotes: 3

Views: 1768

Answers (2)

a_guest
a_guest

Reputation: 36249

You could compare a version that uses ffill and one that uses bfill:

f = df.A.fillna(method='ffill').str.split().str[-1]
b = df.A.fillna(method='bfill').str.split().str[-1]
df.B.where(f != b, f, inplace=True)

Upvotes: 3

Will
Will

Reputation: 1541

This is verbose to illustrate the idea.

# create a column that jsut holds the last value
df["lastval"] = df.A.str.split().str[-1]
# fill blanks in a feed-forward manner
df["ffill"] = df.lastval.fillna(method="ffill") 
# fill blanks in a feed-backward manner
df["bfill"] = df.lastval.fillna(method="bfill") 

# create a mask that handles your logic
m = df.lastval.isnull() & (df["ffill"] == df["bfill"])   
# fill those values into B from lastval
df["B"] = df["lastval"]
df.loc[m,"B"] = df.loc[m, "ffill"]

Upvotes: 0

Related Questions