L.B.
L.B.

Reputation: 456

Create a DataFrame mask based on row/column condition

How can I create a boolean mask where True values happen when the index is greater-or-equal than the index where first non-null value occurs at each column? I.e.:

df = pd.DataFrame(
    [
        [np.nan, np.nan, 1, 1],
        [1, np.nan, np.nan, np.nan],
        [np.nan, 1, np.nan, 1],
        [1, 1, np.nan, 1]
    ],
    columns=['A', 'B', 'C', 'D']
)

print(df)

    A    B    C    D
0   NaN  NaN  1.0  1.0
1   1.0  NaN. NaN  NaN          
2   NaN  1.0  NaN  1.0
3   1.0  1.0  NaN  1.0

Produces the following boolean mask:

array([[False, False,  True,  True],
       [ True, False,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In other words, for column A, the first non-null value occurs at index 1, so "A[1:] = True". For column B, "B[2:] = True". And so on.

I've tried to use to native pd.DataFrame.mask function:

df.mask(df.index >= df.isnull().idxmin(), df)

which raises an error due to the misshaping of the conditional array.

Upvotes: 2

Views: 529

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195613

You can use .ffill():

print(df.ffill().notna().values)

Prints:

[[False False  True  True]
 [ True False  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]

Upvotes: 4

Related Questions