Reputation: 33
I have csv data looks like this:
A B
0 x aa
1 z aa
2
3
4 x aa
5 z bb
6 x bb
7
8 z cc
I would like to fill the empty cells in B column with values in A, if in A column
last_available_value_before_the_NaNs_in_A.split()[-1] == next_available_value_after_the_NaNs_in_A.split()[-1]
the wanted result would be:
A B
0 x aa aa
1 z aa aa
2 aa
3 aa
4 x aa aa
5 z bb bb
6 x bb bb
7
8 z cc cc
data.loc(7,'B')
will be NaN because data.loc(6,'A').split()[-1] ==
data.loc(8,'A').split()[-1]
is false.
data.loc(5,'B')
is 'bb' because data.loc(5,'A').split()[-1] == 'bb'
Thanks for your help!
Upvotes: 3
Views: 1768
Reputation: 36249
You could compare a version that uses ffill
and one that uses bfill
:
f = df.A.fillna(method='ffill').str.split().str[-1]
b = df.A.fillna(method='bfill').str.split().str[-1]
df.B.where(f != b, f, inplace=True)
Upvotes: 3
Reputation: 1541
This is verbose to illustrate the idea.
# create a column that jsut holds the last value
df["lastval"] = df.A.str.split().str[-1]
# fill blanks in a feed-forward manner
df["ffill"] = df.lastval.fillna(method="ffill")
# fill blanks in a feed-backward manner
df["bfill"] = df.lastval.fillna(method="bfill")
# create a mask that handles your logic
m = df.lastval.isnull() & (df["ffill"] == df["bfill"])
# fill those values into B from lastval
df["B"] = df["lastval"]
df.loc[m,"B"] = df.loc[m, "ffill"]
Upvotes: 0