mouwsy
mouwsy

Reputation: 1933

Check if all values of DataFrame are True

How can I check if all values of a polars DataFrame, containing only boolean columns, are True?
Example df:

df = pl.DataFrame({"a": [True, True, None],
                   "b": [True, True, True],
    })

The reason for my question is that sometimes I want to check if all values of a df fulfill a condition, like in the following:

df = pl.DataFrame({"a": [1, 2, None],
                   "b": [4, 5, 6],
}).select(pl.all() >= 1)

By the way, I didn't expect that .select(pl.all() >= 1) keeps the null (None) in last row of column "a", maybe that's worth noting.

Upvotes: 3

Views: 76

Answers (2)

Hericks
Hericks

Reputation: 10454

A more explicit approach could look as follows.

If null values can be ignored:

is_all_true = pl.all_horizontal(pl.all().all())
df.select(is_all_true).item()
True

Explanation. If df is of shape (n, c), then:

  • using pl.all().all() will give a boolean dataframe of shape (1, c) indicating for each column whether it only contains true values;
  • using pl.all_horizontal(pl.all().all()) will give a boolean dataframe of shape (1, 1) indicating whether all values in df are True;
  • finally, .item() is used to pick the literal value from the dataframe of shape (1, 1).

If null values cannot be ignored:

Here, pl.Expr.fill_null is used to explicitly set null values to False before performing the logic above.

is_all_true = pl.all_horizontal(pl.all().fill_null(False).all())
df.select(is_all_true).item()
False

See this answer for more details in the context of checking for null values.

Upvotes: 1

mouwsy
mouwsy

Reputation: 1933

As of the date of this answer, I found the following snippet most appropriate for polars:

df.fill_null(False).min_horizontal().min()  

If no null values exist in df, one could omit .fill_null(False).

Credit goes to roman, the logic of min_horizotnal().min() was first described by him in this answer on a similar issue on any.

Example with the df from above:

>>> df.fill_null(False).min_horizontal().min()
False
>>> df = pl.DataFrame({"a": [True, True, True],
...                    "b": [True, True, True],
...     })
...
... df.fill_null(False).min_horizontal().min()
True

Upvotes: 1

Related Questions