Reputation: 2570
How can I vectorise a replace, by looking for a value in the row.
For a dataframe as follows:
df = pd.DataFrame([(1, 2, 3, 4, np.NaN, np.NaN, 4),
(1, 2, 3, 0, 0, np.NaN, 0),
(1, 2, 3, 4, 5, np.NaN, 5)],
columns = ['P0', 'P1', 'P2', 'P3', 'P4', 'P5', 'Last_not_NaN_value'],
index = ['row1', 'row2', 'row3'])
Output df:
P0 P1 P2 P3 P4 P5 Last_not_NaN_value
row1 1 2 3 4 NaN NaN 4
row2 1 2 3 0 0.0 NaN 0
row3 1 2 3 4 5.0 NaN 5
How can I do something like
df.replace(df['Last_not_NaN_value'], 0 )
<- which does nothing.
How can I look for where the last_not_NaN_value
is in the df, and replace that with a 0, e.g.:
P0 P1 P2 P3 P4 P5 Last_not_NaN_value
row1 1 2 3 *0* NaN NaN 4
row2 1 2 3 0 *0* NaN 0
row3 1 2 3 4 *0* NaN 5
Upvotes: 1
Views: 346
Reputation: 402483
Vectorized, as requested. Perform broadcasted comparison, find the indices of replacement, and just replace accordingly. Afterwards, you can assign the result back using a neat df[:] = ...
trick.
v = df.values
i = v[:, :-1]
j = v[:, -1]
v[np.arange(v.shape[0]), (i == j[:, None]).argmax(axis=1)] = 0
df[:] = v
df
P0 P1 P2 P3 P4 P5 Last_not_NaN_value
row1 1.0 2.0 3.0 0.0 NaN NaN 4.0
row2 1.0 2.0 3.0 0.0 0.0 NaN 0.0
row3 1.0 2.0 3.0 4.0 0.0 NaN 5.0
Upvotes: 2
Reputation: 164663
This is one solution, though not vectorised:
for i in range(6):
df.loc[i==(df['Last_not_NaN_value']-1), 'P'+str(i)] = 0
# P0 P1 P2 P3 P4 P5 Last_not_NaN_value
# row1 1 2 3 0 NaN NaN 4
# row2 1 2 3 0 0.0 NaN 0
# row3 1 2 3 4 0.0 NaN 5
Upvotes: 0