Reputation: 456
How can I create a boolean mask where True
values happen when the index is greater-or-equal than the index where first non-null value occurs at each column? I.e.:
df = pd.DataFrame(
[
[np.nan, np.nan, 1, 1],
[1, np.nan, np.nan, np.nan],
[np.nan, 1, np.nan, 1],
[1, 1, np.nan, 1]
],
columns=['A', 'B', 'C', 'D']
)
print(df)
A B C D
0 NaN NaN 1.0 1.0
1 1.0 NaN. NaN NaN
2 NaN 1.0 NaN 1.0
3 1.0 1.0 NaN 1.0
Produces the following boolean mask:
array([[False, False, True, True],
[ True, False, True, True],
[ True, True, True, True],
[ True, True, True, True]])
In other words, for column A, the first non-null value occurs at index 1, so "A[1:] = True"
. For column B, "B[2:] = True"
. And so on.
I've tried to use to native pd.DataFrame.mask
function:
df.mask(df.index >= df.isnull().idxmin(), df)
which raises an error due to the misshaping of the conditional array.
Upvotes: 2
Views: 529
Reputation: 195613
You can use .ffill()
:
print(df.ffill().notna().values)
Prints:
[[False False True True]
[ True False True True]
[ True True True True]
[ True True True True]]
Upvotes: 4